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Abstract 

A  number  of  questions  regarding  programs  involving  heap-based  data 
structures  can  be  phrased  as  questions  about  numeric  properties  of  those  struc¬ 
tures.  A  data  structure  traversal  might  terminate  if  the  length  of  some  path  is 
eventually  zero  or  a  function  to  remove  n  elements  from  a  collection  may  only 
be  safe  if  the  collection  has  size  at  least  n. 

In  this  thesis,  we  develop  proof  methods  for  reasoning  about  the  connec¬ 
tion  between  heap-manipulating  programs  and  numeric  programs.  In  addi¬ 
tion,  we  develop  an  automatic  method  for  producing  numeric  abstractions  of 
heap-manipulating  programs.  These  numeric  abstractions  are  expressed  as 
simple  imperative  programs  over  integer  variables  and  have  the  feature  that 
if  a  property  holds  of  the  numeric  program,  then  it  also  holds  of  the  original, 
heap-manipulating  program.  This  is  true  for  both  safety  and  liveness.  The 
abstraction  procedure  makes  use  of  a  shape  analysis  based  on  separation  logic 
and  has  support  for  user-defined  inductive  data  structures. 

We  also  discuss  a  number  of  applications  of  this  technique.  Numeric  ab¬ 
stractions,  once  obtained,  can  be  analyzed  with  a  variety  of  existing  verifica¬ 
tion  tools.  Termination  provers  can  be  used  to  reason  about  termination  of 
the  numeric  abstraction,  and  thus  termination  of  the  original  program.  Safety 
checkers  can  be  used  to  reason  about  assertion  safety.  And  bound  inference 
tools  can  be  used  to  obtain  bounds  on  the  values  of  program  variables.  With 
small  changes  to  the  program  source,  bounds  analysis  also  allows  the  compu¬ 
tation  of  symbolic  bounds  on  memory  use  and  computational  complexity. 
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Chapter  1 
Introduction 


Current  static  analysis  tools  can  check  a  wide  variety  of  both  safety  and  liveness  properties 
for  programs  involving  integer  variables.  Tools  such  as  Blast  [Henzinger  et  al.,  2002], 
SLAM  [Ball  et  al.,  2001],  ARMC  [Podelski  and  Rybalchenko,  2007],  Astree  [Cousot 
et  al.,  2005],  Speed  [Gulwani  et  al.,  2009]  and  Terminator  [Cook  et  al.,  2006]  all  focus 
on  this  class  of  programs.  Some  of  these  also  have  support  for  pointers,  but  the  heap 
reasoning  is  generally  kept  as  simple  as  possible  for  the  given  problem  domain. 

Difficulty  occurs  when  we  try  to  integrate  these  methods  with  very  precise  methods  for 
heap  analysis.  Such  combinations  generally  involve  a  large  increase  in  complexity,  both 
in  terms  of  the  verification  problem  and  in  the  implementation.  In  this  thesis,  we  offer  a 
solution  to  this  problem  in  the  form  of  an  automatic  analysis  method  that  proves  program 
properties  by  converting  a  heap-manipulating  program  into  a  numeric  program  that  can 
then  be  analyzed  by  analysis  tools  that  only  support  integer- valued  variables. 

The  numeric  program  may  include  additional  variables,  called  instrumentation  vari¬ 
ables ,  which  are  not  present  in  the  input  program.  These  variables  track  numeric  proper¬ 
ties  of  heap-based  data  structures,  such  as  the  height  of  a  tree,  the  maximal  element  in  a 
list  of  integers,  or  the  length  of  a  path  between  two  points  in  a  data  structure.  Safety  and 
liveness  of  the  numeric  program  can  be  analyzed  and  the  results  carried  over  to  the  original 
heap-manipulating  program.  Bounds  on  variables  are  also  preserved,  which,  when  com- 
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bined  with  additional  instrumentation,  allows  us  to  use  the  numeric  program  to  calculate 
bounds  on  execution  time  and  memory  usage. 


1.1  Approach 

The  approach  taken  by  this  thesis  is  to  prove  properties  of  heap  programs  by  reducing  them 
to  numeric  programs  using  a  static  analysis  based  on  separation  logic.  As  such,  there  are 
two  main  questions  to  address:  “Why  use  separation  logic?”  and  “Why  generate  numeric 
programs?” 


Why  Separation  Logic?  Work  such  as  [Magill  et  al.,  2006,  Distefano  et  al.,  2006,  Chang 
et  al.,  2007,  Calcagno  et  al.,  2009,  Yang  et  al.,  2008]  has  firmly  established  separation 
logic  as  a  viable  basis  for  automated  program  analysis.  Its  suitability  stems  from  its  focus 
on  local  reasoning  [O’Heam  et  al.,  2001],  which  means  that  when  performing  analysis 
of  a  piece  of  code,  we  need  only  consider  memory  used  by  that  code,  rather  than  the 
global  heap.  This  allows  us  to  break  the  verification  problem  into  several  smaller  sub¬ 
problems  and  enables  results  to  be  re-used  in  different  contexts,  all  of  which  helps  improve 
scalability  of  analyses  based  on  separation  logic. 

In  addition,  the  inductive  predicates  used  by  separation  logic  to  define  data  structures 
can  be  viewed  as  specifying  the  connection  between  the  concrete  pointer  structures  ma¬ 
nipulated  by  a  program  and  more  abstract  properties  of  these  structures.  We  leverage  this 
ability  of  separation  logic  in  our  static  analysis  to  establish  a  link  between  concrete  pointer 
structures  and  associated  size  measures.  Such  measures  include  obvious  counts,  such  as 
“the  size  of  the  list  starting  at  x”  as  well  as  less  obvious  metrics,  such  as  “the  number  of 
nodes  in  the  tree  at  root  which  are  to  the  left  of  the  path  from  root  to  cwrr.”  These  measures 
are  critical  when  proving  termination  and  other  liveness  properties,  as  well  as  being  useful 
for  safety  properties. 
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Why  Numeric  Programs?  Given  that  there  are  techniques  that  prove  termination  of 
pointer  programs  directly  [Brotherston  et  al.,  2008b,  Berdine  et  al.,  2006,  Loginov  et  al., 
2006b],  one  might  wonder  why  it  is  useful  to  introduce  the  added  complication  of  translat¬ 
ing  pointer  programs  to  numeric  programs  and  then  proving  termination  of  these  numeric 
programs.  One  answer  is  that,  in  many  ways,  using  numeric  programs  as  an  intermedi¬ 
ate  form  actually  simplies  the  program  analysis.  Termination  proving  itself  is  a  complex 
process  of  computing  transitive  closures  and  inferring  ranking  functions  [Podelski  and  Ry- 
balchenko,  2004,  Cook  et  al.,  2006].  By  making  the  generation  of  numeric  programs  the 
end  goal  of  the  shape  analysis,  we  insulate  it  from  the  complexities  of  termination  proving 
(and  shape  analysis  already  has  plenty  of  complexity  itself).  Furthermore,  by  studying 
what  we  can  prove  while  still  separating  heap  analysis  from  numeric  analysis,  we  are 
able  to  investigate  the  interplay  between  the  fundamentally  structural  notion  of  heap  and 
fundamentally  arithmetic  termination  arguments. 

Finally,  because  the  technique  of  generating  numeric  programs  makes  use  of  termina¬ 
tion  analysis  in  a  “black  box”  fashion,  we  can  benefit  immediately  from  advances  in  ter¬ 
mination  proving  without  requiring  any  changes  to  the  work  described  and  implemented 
in  this  thesis.  Given  that  there  is  a  large  and  active  community  doing  termination  research 
[Bradley  et  al.,  2005b,a,  Cook  et  al.,  2009b,  2008,  Giesl  et  al.,  2006],  this  is  a  major  benefit 
of  our  approach.  This  same  argument  applies  to  other  applications  of  this  technique,  such 
as  computing  bounds  or  proving  safety  properties.  Furthermore,  a  significant  advantage  of 
this  approach  is  the  fact  that  the  same  numeric  abstraction  can  be  used  to  produce  safety 
proofs,  termination  proofs,  and  bounds  on  variable  values.  This  significantly  reduces  the 
amount  of  work  that  must  be  done  to  prove  multiple  properties  of  a  program. 


1.2  Contributions 

The  contributions  of  this  thesis  are  as  follows: 

1.  We  develop  a  theory  of  instrumented  programs  as  a  means  of  relating  heap- 
manipulating  programs  and  numeric  abstractions.  Instrumented  programs  use  sep- 
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aration  logic  annotations  to  connect  the  commands  in  the  numeric  abstraction  with 
the  states  of  the  original  program. 

2.  A  static  analysis  that  automates  the  generation  of  numeric  abstractions.  This  as¬ 
pect  of  the  work  involves  the  specification  of  a  proof  system  for  separation  logic 
assertions,  a  strategy  for  proof  search  in  this  system,  and  the  definition  of  symbolic 
execution  and  abstraction  rules  for  separation  logic  formulas  involving  inductive 
predicates.  These  components  are  all  augmented  with  rules  for  generating  numeric 
commands  that  describe  how  data  structure  manipulations  change  numeric  proper¬ 
ties  of  data  structures.  These  commands  form  the  building  blocks  from  which  the 
numeric  abstraction  is  constructed. 

3.  An  implementation  of  the  static  analysis  described  above  that  supports  the  analysis 
of  C  programs.  It  accepts  user-specified  inductive  data  structure  definitions  and  thus 
allows  support  for  new  data  structures  to  be  added  fairly  easily.  Experimental  results 
involving  a  number  of  examples  and  various  data  structures  are  given.  Our  experi¬ 
ments  also  consider  multiple  program  properties,  including  safety,  termination,  and 
memory  bounds. 

1.3  Example 

We  conclude  this  section  with  an  example  that  concretely  demonstrates  our  approach. 
Consider  the  function  traverse  in  Figure  1.1.  This  C-style  code  performs  a  left-to- 
right,  depth-first  traversal  of  the  tree  at  root.  It  does  this  by  maintaining  a  stack  of  nodes 
to  be  processed.  The  stack  is  a  linked-list  with  nodes  of  type  TreeList  and  initially 
contains  a  single  node  with  a  pointer  to  the  root  of  the  tree.  On  each  iteration,  the  top 
element  of  the  stack  is  removed  and  its  children  are  added.  Empty  trees  are  discarded  and 
when  the  entire  stack  is  empty,  execution  terminates. 

There  are  a  number  of  properties  one  might  want  to  prove  about  this  code.  First,  we 
might  like  to  show  that  it  terminates  on  all  valid  inputs.  We  might  also  be  interested 
in  obtaining  a  bound  on  the  amount  of  memory  allocated  by  the  procedure.  Both  these 
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questions  are  really  questions  about  numeric  properties  of  the  code.  In  the  case  of  termi¬ 
nation,  we  want  to  demonstrate  that  some  ranking  function  decreases  during  each  itera¬ 
tion.  For  a  bound  on  the  number  of  memory  cells  used,  we  can  imagine  adding  a  variable 
memjusage  to  the  program,  which  is  initially  zero  and  increments  each  time  memory 
is  allocated  and  decrements  each  time  it  is  freed.  We  might  be  interested  in  obtaining  a 
bound  on  mem_usage  in  terms  of  the  size  of  the  input  tree. 

In  this  example,  answering  either  of  these  questions  requires  some  reasoning  about  the 
shape  and  size  properties  of  heap-allocated  data  structures.  What  we  show  in  this  thesis, 
and  demonstrate  in  our  experiments,  is  that  the  shape  reasoning  can  be  separated  from 
the  numeric  reasoning  by  constructing  a  numeric  program  that  explicitly  tracks  changes 
in  data  structure  sizes.  A  graphical  view  of  the  steps  in  the  algorithm  is  given  in  Figure 
1.2.  The  figure  also  shows  the  values  of  the  slen  and  ssize  size  measures,  which  we  will 
describe  shortly. 

A  numeric  program  for  this  example  is  given  in  Figure  1.3.  This  program  can  be 
constructed  from  the  original  using  the  rules  in  Chapter  4  and  an  equivalent,  though  larger 
program  can  be  constructed  automatically  by  the  analysis  implementation  discussed  in 
Chapter  5.  In  each  case,  the  variables  in  the  numeric  program  correspond  to  size  properties 
of  the  data  structures  involved. 

Informally,  tsize_root  is  the  number  of  nodes  in  the  tree  at  the  top  of  the  stack, 
the  variable  slen  tracks  the  number  of  nodes  in  the  stack,  and  ssize  is  the  number  of 
nodes  in  the  trees  linked  to  by  nodes  in  the  stack,  as  depicted  in  Figure  1.4.  The  main 
integer  variables  ssize  and  slen  are  updated  by  means  of  a  number  of  temporary  vari¬ 
ables.  These  updates  are  sometimes  non-deterministic.  For  example,  in  the  while  loop 
in  traverse,  we  remove  the  first  element  of  the  stack  and,  if  it  links  to  a  non-empty 
tree,  we  replace  it  with  two  nodes  that  link  to  that  tree’s  children.  Thus,  in  the  numeric 
program  we  must  represent  how  removing  an  element  from  the  stack  changes  the  values 
slen  and  ssize.  In  the  case  of  slen  we  know  that  the  length  simply  decreases  by  one. 
For  ssize,  however,  the  effect  of  removing  an  element  is  not  deterministic.  The  most  we 
can  conclude  is  that  ssize  can  be  broken  into  tsize,  the  size  of  the  tree  linked  to  by 
the  element  we  just  removed,  and  ssize -tail,  the  size  of  the  remaining  portion  of  the 
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struct  Tree  { 

Tree  left; 

Tree  right; 

} 

struct  TreeList  { 

Tree  tree; 

TreeList  next; 

} 

TreeList  push (Tree  r,  TreeList  next)  { 

TreeList  t; 
t  =  malloc ( ) ; 
t->tree  =  r; 
t->next  =  next; 
return  t; 

} 

void  traverse (Tree  root)  { 

TreeList  stack,  tail; 

stack  =  push (root, 0) ; 
while (stack  !=  0)  { 

tail  =  stack->next; 

if (stack->tree  ==  0)  {  //  remove  empty  trees 

free (stack) ; 
stack  =  tail; 

} 

else  {  //  process  non-empty  trees 

tail  =  push ( stack->tree->right , tail ) ; 
tail  =  push ( stack->tree->left , tail ) ; 
free (stack) ; 
stack  =  tail; 


} 


Figure  1.1:  A  function  for  depth-first  traversal  of  a  tree  rooted  at  root 


6 


1  Introduction 


stack.  This  is  accomplished  by  the  non-deterministic  assignment  on  line  6  coupled  with  the 
assume  statements  at  lines  7  and  8.  A  similar  situation  occurs  on  line  12,  when  we  record 
the  relationship  between  tsize  and  the  sizes  of  its  left  and  right  children  (tsize.l  and 
tsize_r,  respectively). 

While  assume  statements  are  not  part  of  standard  C,  they  are  accepted  by  many  ver¬ 
ification  tools,  allowing  us  to  pass  the  code  in  Figure  1.3  directly  to  ARMC  or  Termi¬ 
nator  in  order  to  check  termination.  In  this  case,  the  termination  argument  involves  a 
lexicographic  order  on  ssize  and  slen.  By  producing  numeric  abstractions  such  as  that 
given  in  Figure  1.3,  we  allow  ourselves  and  our  program  analysis  tool  to  concentrate  on 
the  shape  analysis  problem,  while  leaving  details  of  lexicographic  rankings  or  disjunctive 
well-foundedness  [Podelski  and  Rybalchenko,  2004]  to  other  tools. 

We  can  also  ask  bounds  analysis  tools  as  described  in  [Gulwani  et  al.,  2009]  and  [Cook 
et  al.,  2009a]  for  a  bound  on  the  length  of  the  stack.  In  this  case,  the  stack  can  grow  to  size 
tsize.root  +  1  if  the  tree  is  maximally  unbalanced.  The  theory  presented  in  Chapter 
4  also  allows  us  to  obtain  a  numeric  program  that  demonstrates  the  expected  logarithmic 
bound  on  stack  length  for  balanced  trees.  However,  the  shape  analysis  used  by  our  tool 
to  compute  numeric  programs  does  not  yet  support  reasoning  about  tree  balance,  so  such 
proofs  still  involve  a  manual  component. 


Alternate  Abstractions  It  is  often  the  case  that  there  are  different  notions  of  data  struc¬ 
ture  size.  The  measures  used  in  Figure  1.3  are  fairly  natural  in  the  sense  that  the  number 
of  allocated  heap  cells  reachable  through  the  stack  is  the  sum  of  slen  and  ssize.  If 
we  abandon  this  correspondence,  we  can  obtain  the  simpler  numeric  abstraction  given  in 
Figure  1.6.  In  this  case  we  have  only  one  main  size  variable,  ssize,  which  tracks  the 
sum  of  the  sizes  of  the  subtrees  reachable  through  the  stack.  However,  we  alter  the  notion 
of  tree  size  such  that  empty  trees  have  size  equal  to  one,  as  depicted  in  Figure  1.5.  This 
simplifies  the  termination  argument,  as  there  is  now  only  a  single  count,  ssize,  which 
decreases  during  every  iteration.  However,  we  lose  the  ability  to  talk  about  the  length  of 
the  stack  when  computing  bounds  and  we  lose  the  close  connection  between  our  counts 
and  the  number  of  allocated  heap  cells. 
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void  traverse (int  ts±ze_root)  { 

1:  assume (tsize_root  >=  0); 

2:  slen  =  1; 

3:  ssize  =  tsize_root; 

4:  while (slen  >  0)  { 

5:  tsize  =  ?;  ssize_tail  =  ?; 

6:  assume (tsize  >=  0  &&  ssize_tail  >=  0); 

7:  assume (ssize  ==  tsize  +  ssize_tail) ; 

8:  if  (tsize  ==  0)  //  remove  empty  trees 

9:  slen — ; 

10  else  {  //  process  non-empty  trees 

11:  t  s i z  e_l  =  ? ;  t  s i z  e_r  =  ? ; 

12:  assume (tsize_l  >=  0  &&  tsize_r  >=  0)  ; 

13:  assume (tsize  ==  tsize_l  +  tsize_r  +  1); 

14:  ssize  =  tsize_l  +  tsize_r  +  ssize_tail; 

15:  slen++; 

} 

} 

} 

Figure  1.3:  A  numeric  abstraction  of  the  program  in  Figure  1.1. 

The  technique  described  in  this  thesis  has  the  flexibility  to  allow  either  approach  to 
numeric  abstraction,  and  the  implementation  is  not  tied  to  any  fixed  notion  of  size.  Instead, 
we  allow  the  user  to  specify  the  definition  of  size  they  have  in  mind  when  running  the 
tool.  The  numeric  abstraction  corresponding  to  the  input  C  program  is  then  automatically 
generated  for  that  notion  of  size. 
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slen  =  4 


S  "\ 


Figure  1.4:  An  example  showing  slen  and  ssize  used  in  the  program  in  Figure  1.3.  slen  is  the 
number  of  nodes  in  the  stack  and  ssize  is  the  sum  of  the  values  in  the  bold  circles.  The  shaded 
area  contains  the  nodes  that  contribute  to  ssize  and  nodes  in  this  area  arc  labeled  with  the  size  of 
the  subtree  rooted  at  that  node.  Empty  trees  (denoted  by  nil)  have  size  0. 


Figure  1.5:  An  illustration  of  the  notion  of  ssize  used  to  generate  the  program  in  Figure  1.6.  The 
shaded  area  contains  the  nodes  contributing  to  ssize.  Empty  trees  (denoted  by  nil)  have  size  1. 
Non-empty  nodes  arc  labeled  with  the  size  of  the  subtree  rooted  at  that  node,  ssize  is  the  sum 
of  the  values  in  the  bold  circles,  plus  1  for  the  first  element  in  the  stack,  as  nil  has  size  1  using  this 
notion  of  size. 
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void  traverse (int  tsize_root)  { 

1:  assume (tsize_root  >  0)  ; 

2:  ssize  =  tsize_root; 

3:  while(ssize  >  0)  { 

4:  tsize  =  ?;  ss±ze_tail  =  ?; 

5:  assume (tsize  >  0  &&  ssize_tail  >=  0); 

6:  assume (ssize  ==  tsize  +  ssize_tail) ; 

7:  if  (tsize  ==  1)  //  remove  empty  trees 

8:  ssize  =  ssize_tail; 

9:  else  {  //  process  non-empty  trees 

10:  t  s i z  e_l  =  ? ;  t  s i z  e_r  =  ? ; 

11:  assume (tsize_l  >  0  &&  tsize_r  >  0); 

12:  assume (tsize  ==  tsize_l  +  tsize_r  +  1); 

13:  ssize  =  tsize_l  +  tsize_r  +  ssize_tail; 

} 

} 

} 

Figure  1.6:  A  numeric  abstraction  of  the  program  in  Figure  1.1  with  the  notion  of  ssize  and  tsize 
given  in  Figure  1.5. 
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Chapter  2 
Preliminaries 


In  this  chapter  we  present  the  basic  definitions  on  which  we  will  build  the  theory  of  instru¬ 
mented  programs  and  numeric  abstractions  that  is  the  topic  of  this  thesis.  In  Section  2.1, 
we  present  the  syntax  and  semantics  of  the  programming  language  we  consider.  Section 
2.2  gives  the  syntax  and  semantics  of  the  version  of  separation  logic  we  use.  Section  2.2.2 
gives  the  syntax  and  semantics  we  adopt  for  inductive  predicates  in  separation  logic.  And 
finally,  Section  2.4  describes  how  we  can  translate  C  programs  into  the  language  defined 
in  this  chapter. 


Notation  A  summary  of  the  notation  used  in  the  thesis  is  given  as  Appendix  A.  This 
notation  is  described  in  detail  in  this  and  subsequent  chapters. 


2.1  Programs 

Since  our  final  goal  is  to  analyze  C-language  programs,  we  consider  an  imperative  pro¬ 
graming  language  with  unstructured  flow  of  control  (also  referred  to  as  a  goto  language). 
Because  of  the  non-returning  nature  of  gotos,  the  language  is  presented  as  a  language  of 
continuations.  This  serves  as  a  convenient  intermediate  language  for  C  since  the  C  lan- 
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guage  contains  a  goto  statement  and  all  other  control-flow  constructs  can  be  reduced  to 
branches  and  gotos.  We  give  examples  of  such  reductions  in  Section  2.4. 

The  language  is  strongly  typed,  which  deviates  from  C.  We  make  this  choice  because 
it  allows  us  to  focus  on  issues  of  memory  safety,  assertion  safety,  and  termination  while 
ignoring  issues  such  as  pointer  arithmetic  and  casts. 


2.1.1  Syntax  and  Typing 

Figure  2.1  gives  the  syntax  for  programs.  A  program  P  is  a  list  of  labeled  continuations, 
which  can  also  be  viewed  as  a  partial  mapping  from  labels  to  continuations  (and  we  will 
often  use  function  syntax  for  P,  writing  P(l)  for  the  continuation  labeled  with  /  in  program 
P).  The  first  label  l0  is  taken  to  be  the  starting  point  of  execution  and  l0  will  be  referred  to 
as  the  initial  location.  We  write  initloc(P)  for  the  initial  location  of  program  P.  The  set 
L  of  labels  is  assumed  to  be  infinite. 

A  continuation  is  a  branching  structure  consisting  of  conditional  branches  and  com¬ 
mands  that  update  the  state.  At  the  leaves  of  each  continuation,  we  have  either  a  goto 
or  an  indication  that  execution  should  halt  or  abort.  We  write  e  for  the  empty  list  of 
branch  cases  and  omit  it  when  writing  branching  continuations.  For  example,  we  write 
branch  true  k  end  instead  of  branch  true  =>-  k,  e  end.  We  list  assume(e) ;  k  as  a  contin¬ 
uation,  but  this  is  actually  definable  as  branch  e  =>■  k  end — a  fact  we  return  to  in  Section 
2.3.4. 

Commands  include  the  standard  commands  for  variable  assignment,  heap  lookup, 
heap  mutation,  memory  allocation,  and  deallocation.  The  commands  range  over  variables 
drawn  from  the  infinite  set  Vars  and  field  names  drawn  from  the  infinite  set  Fields. 

We  will  write  k  £  subterms (P )  if  k  is  a  sub-term  of  some  continuation  in  the  range  of 
P.  A  program  P  is  considered  well-formed  iff  {/  |  goto  l  £  subterms(P)}  C  dom(P), 
where  dom(P)  is  the  domain  of  P  (the  set  of  labels  prefixing  continuations  in  P).  This 
ensures  that  all  jumps  are  to  locations  defined  by  P.  We  will  restrict  ourselves  to  well- 
formed  programs  for  the  rest  of  this  thesis. 
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Variables  and  expressions  are  typed,  with  the  types  drawn  from  the  set  {a,  i,b}  (rep¬ 
resenting  addresses,  integers,  and  Booleans,  respectively).  We  assume  that  the  set  Vars 
can  be  partitioned  into  two  infinite  subsets  Vars:i  and  Vars,.  We  do  not  include  variables 
of  type  b  in  our  syntax  or  states.  We  write  xa  to  denote  an  element  of  Vars„  and  xl  for  an 
element  of  Varsx.  We  use  r  to  stand  for  either  a  or  i.  Often,  types  can  be  inferred  from  the 
context  and,  in  such  cases,  we  will  omit  them. 

We  take  a  similar  approach  to  typing  of  record  fields.  We  assume  the  set  Fields  can 
be  partitioned  into  two  infinite  subsets  Fields a  and  Fields ;  and  write  /a  for  elements  of 
Fields a  and  /'  for  elements  of  Fields ;. 

We  make  a  distinction  between  integer  values  and  values  representing  addresses  as  a 
means  of  ruling  out  pointer  arithmetic.  Pointer  arithmetic  could  be  handled  by  moving 
to  a  lower-level  memory  model,  where  addresses  are  integers  and  records  are  represented 
by  contiguous  groups  of  memory  cells.  However,  our  analysis  algorithm  does  not  support 
pointer  arithmetic,  so  we  chose  to  rule  it  out  from  the  beginning. 

2.1.2  Semantics 

The  semantics  is  given  in  terms  of  transitions  between  states.  Each  non-terminal  state 
includes  a  store  paired  with  a  heap.  Formally,  a  store  is  a  mapping  from  variables  to  their 
values,  which  are  either  integers  or  addresses.  We  require  that  this  mapping  respects  types 
and  indicate  this  by  using  the  notation  — vT  to  denote  the  function  space.  A  function  /  is  in 
Vars  — yT  Values  iff  /  G  Vars  — >■  Values  and  variables  in  Vars,  are  mapped  by  /  to  integers 
while  variables  in  Varsa  are  mapped  to  addresses.  We  assume  that  Z  and  Addr  are  disjoint 
and  that  Addr  is  an  infinite  set.  We  use  the  meta-variable  v  to  represent  a  value  and  s  to 
represent  a  store. 

v  G  Values  =  Z  U  Addr 
s  G  Stores  =  Vars  —>T  Values 

The  set  of  addresses  contains  a  distinguished  element  nil  which  is  not  in  the  domain 
of  any  heap.  The  heap  is  a  finite  partial  function  from  non-nil  addresses  to  records,  which 
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Syntax  of  Programs 


Types 

T 

G 

{a,  i} 

Variables 

T 

X 

G 

VarsT 

Fields 

r 

G 

FieldsT 

Labels 

i 

G 

L 

Integers 

n 

G 

Z 

Integer  Expressions 

e1 

::  = 

no  c  1  1  o  1  ✓pi  _  ✓->!  o  1  \s  ✓pi 

Jb  |  It  |  O^  \  \L<2  |  o-^  02  |  O-j^  s\  02 

Address  Expressions 

ea 

::  = 

xa  nil 

Boolean  Expressions 

eb 

::  = 

true  false  |  ef  =  e|  |  e\  <  e\  |  eb  A  |  eb  V  -ieb 

Commands 

c 

:  :  = 

xT  :=  eT  |  xT  :=  ?r  |  xj  :=  x\.fT  \  x*.fT  :=  eT  \ 

xa  :=  alloc(/^’1, . . . ,  /^n)  free  xa  skip 

Branch  Cases 

P 

::  = 

eh  ^  k,  (3  \  e 

Continuations 

k 

::  = 

c;  k  halt  abort  |  goto  l  branch  (3  end  assume(eb) ;  k 

Programs 

P 

::  = 

-se 

o 

o 

Figure  2.1:  Syntax  of  programs. 


are  finite  partial  functions  from  fields  to  values  of  the  appropriate  type.  We  use  the  meta¬ 
variable  h  to  represent  an  element  of  Heaps. 

Records  =  Fields  Values 
h  G  Heaps  =  ( Addr  —  {nil})  — -  Records 

As  with  stores,  the  functions  that  serve  as  the  denotation  of  records  must  respect  types. 
Unlike  stores,  they  need  not  be  defined  on  all  elements  of  the  domain  (different  heap  cells 
may  contain  different  sets  of  fields).  We  refer  to  an  (s,  h )  pair  as  a  memory  state. 

Memory  States  ( s,h )  G  Stores  x  Heaps 

We  also  include  an  error  state  representing  the  result  of  an  erroneous  computation  such 
as  an  attempt  to  dereference  unallocated  memory. 
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The  semantics  of  expressions  is  given  in  Figure  2.2.  In  addition  to  the  sets  Addr  and 
Z,  that  were  defined  previously,  the  semantics  of  expressions  makes  use  of  a  set  Bool  of 
Boolean  values,  defined  as  Bool  =  {true false} .  We  note  the  following  theorem,  which 
relates  the  meaning  of  expressions  to  their  types  and  ensures  that  our  interpretation  of 
expressions  is  well-defined. 

Theorem  1. 

Vs,ea.  [ea]  s  G  Addr  (2.1) 

V^e1.  Hse  Z  (2.2) 

Vs,  eb.  leh}seBool  (2.3) 

Proof.  The  proof  is  by  induction  on  the  structure  of  the  expression  language  and  each 
case  follows  directly  from  the  expression  semantics  and  the  requirement  that  stores  are 
well-typed.  □ 

Another  property  of  expressions  is  that  only  the  portion  of  the  store  involving  the 
variables  that  appear  in  the  expression  affects  its  value.  This  is  captured  by  the  following 
lemma. 

Definition  1.  Let  s  —v  s'  hold  iff  \/x.  x  G  V  =>■  s(x)  =  s'(x). 

Definition  2.  Let  fv(e)  be  the  function  that  returns  the  set  of  variables  occurring  free  in  e. 
Since  there  are  no  binding  constructs  in  the  expression  language,  this  is  just  the  set  of  all 
variables  appearing  in  e. 

Lemma  1.  If  s  —y  s'  and  fv(e)  C  V  then  [e]  s  =  [e]  s'. 

Proof.  The  proof  is  by  induction  on  the  expression  e.  The  inductive  cases  are  straight¬ 
forward.  To  take  an  example,  consider  the  case  e1  +  e2.  We  assume  s  —v  s'  and 
fv(e i  +  e2)  C  V.  The  second  assumption  implies  fv{ef)  C  V  and  fv(e2)  C  V.  This  allows 
us  to  apply  the  induction  hypothesis  and  conclude  that  [ei]  s  =  [ei]  s'  and  [e2]  s  =  [e2]  s'. 
It  then  follows  that  [ei]  s  +  [e2]  s  =  [ei]  s'  +  [e2]  s',  which,  by  the  definition  of  [ei  +  e2] 
implies  that  [ei  +  e2]  s  —  [ei  +  e2]  s'. 
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Semantics  of  Expressions 
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Figure  2.2:  Semantics  of  expressions.  A,  V,-i  in  the  definitions  refer  to  the  standard  Boolean 
operations  with  type  Bool  x  Bool  — >  Bool  (for  A  and  V)  and  Bool  -A  Bool  (for  ->).  The  functions 
,  x  refer  to  the  standard  addition,  subtraction,  and  multiplication  functions  of  type  Z  x  Z  — >  Z. 
The  <  relation  is  the  standard  “less  than  or  equal  to”  relation  on  integers  and  =  is  the  identity 
relation  on  addresses,  which  relates  each  address  only  to  itself. 

The  base  cases  for  the  constants  are  immediate,  as  the  store  does  not  affect  their  se¬ 
mantics  at  all.  This  covers  n,  nil,  true,  and  false.  We  are  left  with  the  variable  case.  If 
e  —  x  then  [e]  s  =  s(x),  so  we  must  show  s(x)  =  s'(x).  The  definition  of  s  —v  s'  gives 
us  x  E  V  =>■  s(x)  =  s'(x),  so  it  suffices  to  show  x  E  V.  This  follows  directly  from  our 
assumption  that  fv(x)  C  V  and  the  fact  that  fv(x)  =  {a:}.  □ 

The  semantics  of  commands  is  given  in  Figure  2.3.  The  command  x  :=  e  is  a  standard 
assignment  statement,  x  :=  ?  is  non-deterministic  assignment,  x\  :=  x-i.f  reads  a  value 
from  a  heap  cell,  and  x.f  :=  e  writes  a  value  into  a  heap  cell.  Attempts  to  read  from  or 
write  to  a  non-existent  record  field  result  in  a  run-time  error,  represented  by  error.  The 
command  x  :=  alloc(/i, . . . ,  fn)  allocates  a  new  heap  cell  with  fields  /i,  •  •  • ,  /„..  The 
fields  are  initially  mapped  to  non-deterministically  chosen  values  of  the  correct  type.  The 
field  names  provided  must  all  be  distinct.  The  command  free  x  disposes  of  the  heap  cell 
at  x.  We  permit  the  call  free  nil,  which  has  the  effect  of  a  no-op.  We  do  this  to  match  the 
semantics  of  the  “free”  function  call  in  the  C  programming  language,  which  will  be  the 
source  language  we  ultimate  target  with  our  analysis. 
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We  claim  that  the  type  of  [c]  is  Stores  x  Heaps  —t  2^StoresxHeaps^error^ .  To  verify 
this,  we  must  check  that,  in  all  rules,  the  store  and  heap  are  updated  in  a  manner  consistent 
with  the  types.  In  all  cases,  this  follows  immediately  from  the  well-typedness  of  the  initial 
store  and  heap  and  Theorem  1. 

One  property  of  commands  is  that  only  the  heap  and  the  portion  of  the  store  corre¬ 
sponding  to  the  variables  used  by  the  command  affects  execution.  This  is  captured  by  the 
following  Lemma. 

Definition  3.  Let  fv(c)  indicate  the  set  of  free  variables  occurring  in  command  c.  Since 
there  are  no  binders  in  the  syntax  for  commands,  this  is  the  set  of  all  variables  occurring 
in  c. 

Lemma  2.  If  s±  —y  s2  and  fv(c )  C  V  then  for  all  h,  s[,  h'  the  following  holds 

((4 ,h')  G  ([c]  (si,/*)))  =>  (4 ,h')  e  ([c]  (s2,h))  A  =v  s'2 )) 

This  states  that  if  V  is  a  set  containing  the  free  variables  of  command  c,  and  two  stores 
agree  on  the  values  of  variables  in  V,  then  an  evaluation  of  c  from  either  of  the  two  stores 
has  a  matching  evaluation  starting  from  the  other  store  (matching  in  the  sense  that  the 
post-states  agree  on  the  values  of  variables  in  V). 

Proof.  The  proof  proceeds  by  case  analysis  on  the  command  c  in  question  and  most  cases 
follow  directly  from  the  definition  of  [c]  and  Lemma  1 .  Note  that  according  to  the  seman¬ 
tics  in  Figure  2.3,  we  have 

Vc,  s ,  h.  (error  G  [c]  (s,  h ))  ([c]  (s,  h )  =  {error}) 

To  see  why  this  holds,  note  that  the  only  commands  that  can  result  in  error  are  those  of 
the  form  x\  :=  x2.f  or  x.f  :=  e  or  free  x.  Examining  the  semantics  for  these  commands 
reveals  that  the  error  case  results  in  the  singleton  set  {error}.  Thus,  the  fact  that  we  have 

(s{,  If)  G  ([c]  (si,  h))  as  a  hypothesis  implies  that  error  ^  ([c]  (si,  h)). 

CASE  x.f  :=  e:  Since  error  0  ({x.f  :=  e]  (si,  h)),  we  have  the  following 

Si(x)  G  domfh )  A  /  G  dom{h{s\{x))) 
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We  have  si  —v  s2  as  an  assumption  and  x  G  V  from  our  assumption  that  fv (x.f  :=  e )  C  V. 
This  then  gives  us  Si(x)  =  s2(x)  and  allows  us  to  derive 

s2(x)  G  dom(h )  A  /  G  dom(h(s2(x ))) 

This  implies  that  | x.f  :  =  e]  (s2,  h)  does  not  result  in  an  error.  Thus,  we  have 

{x.f  :=  e]  (si,  h)  =  {(si,  h[(si(x)).f  -A  ([e]  si)])} 

and 

{x.f  :=  e]  O2,  h)  =  {(s2,  h[(s2(x)).f  — >•  ([e]  s2)])} 

We  must  show  si  =y  s2,  which  we  already  have  from  our  assumptions.  We  also  must 
show  the  following. 

(h[(si(x)).f  -A  ([e]  si)])  =  (/i[(s2(x))./  -A  ([e]  s2)]) 

Since  a;  G  V,  we  have  that  si(x)  =  s2(x).  Thus,  the  above  reduces  to  showing  that 

(H  si)  =  ([el  s2) 

which  follows  from  Lemma  1. 

CASE  X\  :=  x2.f:  Again,  we  have  from  our  assumptions  that  x\  :=  x2.f  does  not  result 
in  error.  From  si  —v  s2  and  fv(x\  :=  x2.f)  C  V,  we  have  that  Si(xi)  =  s2(x1)  and 
si(x2)  =  s2(x2).  This  gives  us  the  following. 

[xi  :=  x2.fj  (si,  h)  =  {(si[xi  — >•  (h(s1(x2)))  /],  h)} 

and 

[xi  :=  x2.fj  ( s2,h )  =  {(s2[a;i  ->■  (h(s2(x2)))  f],h)} 

We  must  show 

(sifci  -A  {h(si(x2)))  /])  =v  (s2[zi  ->■  (h(s2(x2)))  /]) 

We  have  that  X\  G  V  and  s  1  =y  s2,  so  the  above  will  hold  if  we  can  show 

(h(s  i(x2)))  =  (h(s2(x2))) 
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This  holds  if  si(re2)  =  s2(rc2)  which  follows  from  x2  G  V  and  si  —y  s2. 

CASE  free  x:  As  before,  we  have  s i  —v  s2  and  fv( free  re)  C  V,  which  implies  x  G  V  and 
thussi(re)  =  s2(re).  Since  [free  re]  (si,h)  ^  {error},  we  have  si  (re)  G  (dom(h)  U  {nil}) . 
This  combined  with  si(rc)  =  s2(rc)  gives  us  s2(rc)  G  ( dom(li )  U  {nil}).  Since 
[free  re]  (si,  h )  =  (si,  h  —  {si(rc)}),  and  [free  xj  ( s2 ,  h )  =  (s2,  h  —  {s2(re)}),  we  must 
show  si  =1/  s2,  which  we  already  have,  and  ( h  —  {si(rr)})  =  ( h  —  {s2(rr)}),  which 
follows  from  s^rr)  =  s2(x). 

CASE  x  :=  ?:  We  have 

[rr  :=  ?]  (si,  h)  =  {(s[,  h)  \  s[  =  s[x  n]} 
where  v  is  chosen  from  the  appropriate  domain  (either  Addr  or  Z).  For  s2  we  have 

[re  :=  ?]  (s2,  h)  =  {(s'2,  h)  \  s'2  =  s[x  ->■  u]} 

Suppose  (s[,  h)  G  [re  :=  ?]  (s1;  h).  We  must  show 

3s'2.  (s',  h)  G  ([x  :=  ?]  (s2,  h))  A  s[  =v  s2 

We  choose  s2  =  s2[re  — >  s^  (re)] .  Clearly  this  is  in  [re  :=  ?]  (s2,  h).  To  see  that  s',  —v  s'2,  we 

must  show  that  s'2(rc)  =  s',  (re),  which  is  immediate  from  the  definition  of  s'2.  Agreement 
of  s'2  and  s',  on  the  rest  of  V  follows  from  the  assumption  that  si  —y  s2. 

CASE  re  :=  alloc(/i, . . . ,  fn):  The  semantics  of  this  command  chooses  an  address  v  not  in 
dom(h)  and  assign  v  to  re  in  the  post-state.  Since  we  are  evaluating  re  :=  alloc(/1, . . . ,  fn) 
under  the  same  heap  but  a  different  store,  we  have  that  v  is  also  a  valid  choice  of 
address  when  determining  [re  :=  alloc(/1, . . . ,  /n)]  (s2,  h).  It  remains  to  show  that 
Si  [re  — y  v]  —v  s2[ re  — >  v ],  which  follows  from  s  =v  s'. 

CASE  re  :=  e:  We  have 


[re  :=  e]  (si,  h)  =  {(si[re  ->  [e]  sx],  h)} 


and 

lx  ■=  e]  (s2,  h)  =  { (s2 [re  ->•  [e]  s2],  h)} 
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We  must  show 

s i[x  -A  [e]  si]  =v  s2[x  ->•  [e]  s2] 

Since  we  have  si  =y  s2,  it  suffices  to  show  that  [e]  si  =  [e]  s2.  This  is  established  by 
Lemma  1.  □ 

We  also  have  a  similar  property  for  commands  that  result  in  an  error. 

Lemma  3.  If  s\  —v  s2  and  fv(c)  C  V  then 

error  G  ([c]  (s±,h))  error  G  ([c]  (s2,h)) 

Proof.  The  proof  proceeds  by  case  analysis  on  the  command  c.  There  are  only  three 
commands  that  can  result  in  error.  These  are  X\  :=  x2.f  and  x.f  :=  e  and  free  x. 

CASE  x\  :=  x2.f:  If  error  G  (|a;i  :=  x2.f  ]  (si,h))  then,  according  to  the  semantics 
of  commands  (Figure  2.3),  either  si(x2)  fL  dom(h)  or  f  £  dom(h(si(x2))).  Suppose 
Si(x2)  fL  dom(h).  Then  since  s i  —v  s2  and  x2  G  V  we  have  si(x2)  =  s2(x2)  and  thus 
S2(x2)  dom(h).  If  /  ^  dom(h(si(x2))),  then  again  we  note  that  x2  G  V  and  thus 
Si(x2)  =  s2(x2),  which  gives  us  /  ^  dom(h(s2(x2))). 

CASE  x.f  :=  e:  This  is  similar  to  the  case  above.  We  have  either  si(x)  fL  dom(h)  or 
/  dom(h(si(x))).  We  have  x  G  V  and  s i  =y  s2,  which  yields  si(x)  =  s2(x),  which 
gives  us  that  either  s2(x)  fL  dom(h )  or  f  fL  dom(h(s2(x))) . 

CASE  free  x:  In  this  case  we  have  si(x)  fL  ( dom(h )  U  {nil}).  Again  Si(x)  =  s2(x)  and 
so  s2(x)  (dom(h)  U  {nil})  □ 

Figure  2.4  gives  the  transition  semantics  of  continuations.  There  are  three  types  of 
execution  states:  intermediate  states,  in  which  the  continuation  is  still  executing;  terminal 
states,  which  indicate  that  execution  has  stopped;  and  goto  states,  which  indicate  that 
the  end  of  this  continuation  has  been  reached  but  execution  has  not  stopped  and  should 
continue  from  another  continuation.  Intermediate  states  have  the  form  (k,  (s,  h))  where  k 
is  the  current  continuation  and  (s,  h )  is  the  current  store  and  heap.  Terminal  states  either 
have  the  form  final(s,  h),  which  indicates  that  the  program  has  terminated  in  the  memory 
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Semantics  of  Commands 


[skip]  (s,h) 
\xT  :=  eT]  (s,h) 
[xa  :=  ?a]  (s,h) 
[t1  :=  71]  (, s,h ) 
\x\  ■■=  xl-fT\  (s,/i) 


\x\r  :=eT]  (s,h) 


{(s,h)} 


{(s[xT  -A  [eT]  a],h)} 

{(s',  h)  |  s'  =  s[.xa  — >  v]  A  v  G  Addr} 

{(s',  /i)  |  s'  =  sf.x1  ->»]AueZ) 

{(s[x{  -A  (h(s(x f)))  /T],  /i)}  if  s(x|)  G  dom(h ) 

A  /r  G  dom(h(s(x |))) 
{error}  otherwise 

{(s,  /i[(s(xa))./r  -A  ([er]  s)])}  if  s(xa)  G  dom(h) 

A  /r  G  dom(h(s(xa))) 


{error} 


otherwise 


[xa  :=  alloc(/1n, . . . ,  /£>)]  (s,  /r)  = 

{(s',  /i')  |  v  G  dom(h!)  and  dom{h!{y))  =  {f^1 , . . . ,  /£"} 
and  h'  —  {n}  =  /i 
and  s'  =  s[xa  -A  v]  and  v  G  Addr 


and  h'(v)(fp)  G  Z  if  n  =  i 
and  h'(v)(fp)  G  Addr  if  r,;  =  a} 

[free  xa]  (s,  h)  =  {(s,  h  —  {s(xa)})}  if  s(xa)  G  (dom(h)  U  {nil}) 

{error}  otherwise 


Figure  2.3:  Semantics  of  commands,  dom(g)  indicates  the  domain  of  function  g.  The  notation 
g[x  —A  v\  indicates  the  function  that  is  the  same  as  g,  except  that  x  is  mapped  to  v.  The  notation 
h[v\.f  — >  V2]  indicates  the  heap  that  is  the  same  as  h  except  the  record  at  v  1  maps  field  /  to  V2 . 
We  write  h  —  X  to  indicate  the  function  obtained  by  restricting  the  domain  of  h  to  dom{h)  —  X. 
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Semantics  of  Continuations 

G  [c]  (s,/i) 

((c;k),  (s,h))  (k,(s',ti)} 

[ej]  s  =  true 

(branch  . . . ,  e;  =>■  &*, . . .  end,  (s,  h))  ( ki ,  (s,  /i)) 


error  E  [c]  (s,  h ) 
((c;  fc),  (s,  /r))  ^  error 


(halt,  (s,  /i))  final(s,  h) 


((goto  l),  (s,  /»))  goto(f,  (s,  h,)) 


(abort,  (s,  h))  ^  error 


[ej  s  =  true 

((assume(e);  k),  (s,  h))  (fe,  (s,  /i)) 


Figure  2.4:  Semantics  of  continuations.  The  semantic  rule  for  “assume(e);  k”  is  included  for 
clarity,  but  officially  we  consider  “assume(e);  k”  to  be  an  abbreviation  for  “branch  e  =>  k  end” 
(which  produces  the  same  result  as  the  rule  above). 

state  (s,  h )  or  error,  which  indicates  that  the  program  has  terminated  in  the  error  state. 
Goto  states  have  the  form  goto(/,  (s,  h))  and  indicate  that  execution  should  continue  from 
label  l  in  memory  state  (s,  h)  (the  role  of  goto  states  is  further  described  in  Section  2.3, 
Definition  13).  We  use  the  meta-variable  7  to  represent  an  execution  state  and  the  meta¬ 
variable  G  to  represent  the  set  of  all  execution  states. 

Execution  States  (G)  7  ::=  ( k,(s,h ))  |  flnal(s,/t)  |  goto (l,(s,h))  \  error 

We  will  sometimes  simply  use  the  word  state  when  it  is  clear  from  context  whether  we  are 
referring  to  an  execution  state  or  a  memory  state. 

Note  that  in  the  semantics  for  branches  given  in  figure  2.4,  a  non-deterministic  choice 
is  made  among  all  branches  whose  condition  is  satisfied.  There  is  no  transition  from  a 
state  in  which  we  are  evaluating  a  branch  and  none  of  the  conditions  hold.  We  will  say 
more  about  how  this  property  of  the  continuation  semantics  affects  our  program  semantics 
in  the  next  section  when  we  discuss  execution  traces.  Here  we  will  simply  note  that,  in  the 
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source  programs  we  consider,  all  branches  will  be  total  in  the  sense  that  the  disjunction  of 
their  conditions  is  equivalent  to  true.  Thus,  any  execution  state  associated  with  a  branch 
in  the  source  program  can  always  make  a  transition. 

Figure  2.5  gives  an  example  of  the  semantics  of  continuations.  The  arrows  are  labeled 
with  the  commands  corresponding  to  the  transitions.  Transitions  labeled  with  Boolean 
conditions  (i  >  0  in  the  first  transition)  correspond  to  the  selection  of  the  branch  labeled 
with  that  condition. 


2.2  Separation  Logic 

Note  that  all  non-error  states  contain  a  store  and  a  heap.  We  will  use  formulas  in  separation 
logic  [Reynolds,  2002]  to  represent  sets  of  store-heap  pairs.  The  syntax  for  formulae  is 
given  in  Figure  2.6  and  describes  a  fragment  of  separation  logic  specialized  to  our  heap 
model.  The  expressions  (e)  are  those  defined  in  Figure  2.1.  V  is  a  set  of  identifiers  that  are 
used  to  refer  to  inductively-defined  predicates,  which  we  discuss  in  Section  2.2.2. 

The  semantics  of  formulae  is  given  in  Figure  2.7.  The  semantics  is  given  as  a  relation  of 
the  form  (s,  h)  \=x  Q,  where  s  is  a  store,  h  is  a  heap,  0  is  a  separation  logic  formula  and  A" 
is  a  partial  mapping  from  inductive  predicate  names  to  the  predicates’  denotations  (which 
are  functions  yielding  sets  of  heaps).  The  relation  (s,  h)  \=x  Q  is  only  defined  when 
dom(X)  contains  all  predicate  names  appearing  in  0.  We  describe  inductive  predicates  in 
detail  in  the  next  section  and  focus  on  the  other  cases  here.  If  (s.  h)  \=x  Q  holds  for  all 
s,  h,  we  denote  this  as  \=x  Q- 

The  formula  emp  describes  the  empty  heap.  The  formula  x  i-»  [fi  :  ei,  ...,  fn  :  en\ 
describes  a  singleton  heap  where  x  points  to  a  record  whose  fi  field  contains  the  value 
of  e\  and  so  on  (as  with  the  syntax  for  branches,  we  omit  the  e  that  terminates  the  field 
list  when  writing  records).  The  field  names  /j, . . . ,  fn  must  be  distinct.  A  store,  heap 
pair  (s,  h )  satisfies  Q\  *  Q2  iff  h  is  a  union  of  domain-disjoint  heaps  hi  and  h2  such 
that  (s,  hi)  satisfies  Qi  and  (s,  h2)  satisfies  Q2.  The  binary  operators  A  (conjunction),  V 
(disjunction),  and  (implication)  have  their  usual  semantics.  For  the  binary  operators, 


25 


Figure  2.5:  Iteration  number  one  of  a  loop  that  creates  a  singly-linked  list. 


2.2  Separation  Logic 


Syntax  of  Separation  Logic  Formulae 

Inductive  Predicates  pT ,  rT  £  VT 

Records  p  ::=  e  |  fT  :  eT,  p 

Spatial  Predicates  E  ::=  emp  |  ea  t-P-  [p]  \  pT(eT) 

Separation  Logic  Formulae  Q  ::=  eb  |  E  |  Qi  *  Q2  \  Qi  A  Q2  \  Qi  V  Q2  \ 

Qi  =>  Q2  |  3.xr.  Q  I  VxT.  Q 

Figure  2.6:  Syntax  of  separation  logic  formulae. 

the  order  of  precedence,  from  strongest  to  weakest  is:  A,  V,  =>.  The  operators  A,  V, 

and  *  are  associative,  so  order  of  operations  among  sequences  of  formulae  joined  by  the 
same  one  of  these  operators  at  the  same  level  does  not  matter. 

We  write  r  to  represent  the  sequence  of  types  TiT2  . . .  rn.  Meta-variables  pT  and  rT 
represent  the  names  of  inductive  predicates.  The  superscript  t  encodes  both  the  number 
and  types  of  the  arguments  the  predicate  expects.  For  example,  piaa  is  a  predicate  that  takes 
an  integer-valued  argument  followed  by  two  address-valued  arguments.  We  write  VT  for 
the  set  of  all  predicates  of  type  r.  If  r  =  T\ . . .  rn,  we  write  xT  to  denote  a  list  of  variables 
, . . . ,  Y'jj" .  Similarly,  we  write  eT  to  represent  the  list  of  expressions  e[' .....  ejy.  We 
discuss  inductive  predicates  further  in  the  next  section. 

2.2.1  Effect  of  Free  Variables 

The  free  variables  of  a  separation  logic  formula  Q  are  defined  in  Figure  2.8.  We  have  a 
result  for  separation  logic  formulae  similar  to  Lemma  1,  which  involved  expressions. 

Lemma  4.  If  s  —v  s'  and  fv(Q )  C  V  then  for  cdl  X,  h,  we  have  (s,  h )  |=a_  Q  if  and  only 
if  (s'  1  h)  \=x  Q. 

Proof  The  proof  is  by  induction  on  the  structure  of  Q. 
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Semantics  of  Separation  Logic  Formulae 


if 

r  :  , 

ZT,p\s 

= 

{(/MeT]  s)}U(Ms) 

S 

= 

{} 

KV 

••,<"1  S 

= 

([e?]  s,  ■  ■  ■ ,  [e£*]  s) 

0; 

h) 

1 

\ — x  e 

< & 

[eb]]  s  =  true 

0; 

h) 

\=x  emp 

< & 

h  =  { } 

0; 

h) 

\=x  ea  i-A 

\p] 

< & 

h  =  (dei  m  «)} 

0; 

h) 

T 

-U 

0 

< & 

h  €  (J*V)([e? ]  s)) 

0; 

h) 

\=x  Qi  a 

Q2 

< & 

(s,  h)  \=x  Q 1  and  (s,  h)  \=x  Q2 

0; 

h) 

\=x  Qi  v 

Q2 

< & 

(s,  h)  \=x  Q 1  or  (s,  h)  \=x  Q2 

0> 

h) 

\=x  Qi 

>  Q2 

< & 

(s,  h)  \=x  Q 1  implies  (s,  h)  \=x  Q2 

0; 

h) 

\=x  Q  i  * 

Q2 

< & 

There  exist  hi,  h2  such  that 

dom(hi)  FI  dom(h2)  =  0  and  h  =  h 

(s,  hi)  \=x  Q 1  and  (s,  h2)  \=x  Q2 

0; 

h) 

\=x  3x*/'1 

■  Q 

< & 

there  exists  v  £  Addr/h  such  that 

(s[xa/l  -A  v],h)  \=x  Q 

0; 

h) 

\=X  vW1 

■  Q 

< & 

for  all  v  <G  Addr/ Z  we  have 

(s[xa/l  —>  v],h)  \=x  Q 
\=x  Q  o  Vs, /i.  ((s,h)  \=x  Q ) 


Figure  2.7 :  Semantics  of  separation  logic  formulae.  We  have  combined  the  3  rules  for  address  and 
integer-valued  variables,  using  a  “/”  to  separate  the  alternatives.  The  field  names  in  any  record  p 
must  be  distinct.  The  semantics  of  expressions,  [e]  s,  is  given  in  Figure  2.2. 
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fv(fT  :  eT,  p ) 

Me) 

fv(emp) 
fv(ea  ha  [p]) 

Jv(pf(e?  •••<")) 


fv(e)Ufv(p) 

{} 

{} 

fv(ea)Ufv(p) 

Me?)  U...U /«(<£") 


MQi  *  Q2) 
MQi  A  Q2) 
MQi  V  Q2) 
fv(Qi  =>  Q2) 

M3a;T-  Q) 
fv(\/xT .  Q) 


fv(Qi)  U  fv(Q2) 
fv(Qi)  U  fv(Q2) 
fv(Qi)  U  fv(Q2) 
fv(Qi)  U  fv(Q2) 

fv(Q)  ~  {%T} 
HQ)  -  H} 


Figure  2.8:  The  definition  of  the  function  fv(Q),  which  gives  the  free  variables  of  formula  Q.  If 
Q  =  eb,  the  free  variables  are  as  given  in  Definition  2. 


CASE  Q  =  eb:  In  this  case,  the  definition  of  \=x  from  Figure  2.7  tells  us  that  (s,  h)  \=x  Q 
iff  [eb]  s  =  true.  By  Lemma  1  we  then  have  that  [ebJ  s  =  true  iff  [eb]  s'  =  true.  This 
implies  (s,  h)  \=x  Q  iff  (s',  h)  \=x  Q. 

CASE  Q  =  emp:  In  this  case,  (s,  h)  |=  y  emp  iff  h  =  {}.  Since  s  is  not  involved  in  the 
definition  of  the  semantics  of  emp,  we  easily  have  (s,  h )  |=.y  emp  iff  (s',  h )  \=x  emp. 

CASE  Q  =  ea  1— )■  [/)]:  We  first  prove  the  following  lemma: 

Vp,  s,  s',  (s  = v  s')  A  (fv(p)  C  V)  =>  (|p]  s  =  [p]  s') 

This  is  proved  by  structural  induction  on  p.  There  are  two  cases.  If  p  =  e  then 
[p]  s  =  {}  and  [p]  s'  =  {},  implying  [p]  s  =  [p]  s'.  If  p  =  fT  :  eT ,  p'  then  we  have 
[p]  s  =  {(/T>  [eT]  s)}  u  (IpT  s)-  By  the  induction  hypothesis  we  have  [p'J  s  =  [p']  s'. 
Since  fv(Q)  C  V  we  have  that  fv(eT)  C  V  and  thus  by  Lemma  1  we  have  [eT]  s  =  [eT]  s'. 
Combining  these  we  have  the  following. 

ur,  n  8)}  u  dp']  8)  =  {(r,  n  w  u  (m  j) 

This  is  equivalent  to  [p]  s  =  [p]  s',  which  is  our  goal. 

Having  proved  the  result  for  record  expressions  p,  we  can  now  turn  back  to  Q.  Since 
fv(Q)  C  V  and  Q  =  ea  i-t-  [p],  we  have,  as  a  consequence  of  Definition  2.2.1  that 
fv(ea)  C  V  and  fv  (p)  C  V.  Thus,  by  Lemma  1  and  by  our  intermediate  lemma  above,  we 
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have  [ea]  s  =  [ea]  s'  and  [p]  s  =  [p]  s'.  This  implies 

{([ea]5,[p]S)}  =  {([ea]S',  [pK)} 

which  implies  (s,  h)  \=x  Q  AA  (s' ,  h )  \=x  Q  by  the  definition  of  |=x  given  in  Figure  2.7. 

CASE  Q  =  pT(eT):  We  first  consider  the  forward  implication.  We  assume 

(s,h)  \=  pT(eT)  and  show  (s',  h)  |=  pT(eT).  We  have  from  our  semantics  that 
(s,h)  |=  pT(eT)  implies  h  G  (A"(p)([e T]  s)) .  Since  fv(eT)  C  V  we  have  by  Lemma 
1  that  [e  r]  s  =  [e  T]  s'.  This  implies 

(*(p)(P1  =  (A'(P)(ri  vj) 

Since  we  have  h  G  (A"(p)([eTJ  s))  this  lets  us  conclude  h  G  (X(p)([e" r]  s'))  which 
implies  (s',  h )  |=  Q.  The  backward  implication  is  the  same  with  s  and  s'  reversed. 

CASE  Q  —  Qi  *  Q 2:  We  have  (s,h)  \—x  Q i  *  Q-2  iff  there  exist  hi,h2  such  that 
dom(hi)  fl  dom(h2 )  =  0  and  h  =  hi  fl  h2  and  (s,hi)  \=x  Q i  and  ( s,h2 )  |=x  Q2- 
That  fv(Q)  C  V  implies  fv (Qi)  C  V  and  fv(Q2 )  C  1/.  We  can  then  apply  the  induction 
hypothesis,  which  gives  us  that  (s,  hi)  \=x  Qi  iff  (s',  hi)  \=x  Qi  and  similarly  for  Q2. 
This  implies  our  result. 

CASE  Q  =  Qi  A  Q2:  We  have  (s,  h)  \=x  Qi  A  Q2  iff  (s,  h)  \=x  Qi  and  (s,  h)  \=x  Q2- 
Again,  fv(Q)  C  V  implies  fv(Qi)  C  1/  and  fv(Q2)  C  1/,  allowing  us  to  apply  the 
inductive  hypothesis  and  obtain  (s,  /i)  |=x  Q i  iff  (s',  /f)  |=x  Q  i  (and  similarly  for 
(s',  A)  [=x  Q'2)-  This  implies  our  result. 

CASE  Q  =  Qi  V  Q2:  This  case  is  very  similar  to  the  *  and  A  cases.  We  have 
(s,  h)  \=x  Q i  V  Q2  iff  (s,  h)  \=x  Q i  or  (s,  h)  \=x  Q2.  In  either  case,  we  have  fv(Qi)  C  V 
and  apply  our  inductive  hypothesis  to  obtain  (s,  h)  [=x  Qi  iff  (s',  h)  \=x  Qi,  which  lets 
us  conclude  that  (s,  h)  \=x  Q  iff  (s',  h)  \—x  Q. 

CASE  Q  =  (Q i  =>■  Q 2):  We  will  consider  the  forward  direction  first  and  show  that 
(s,h)  hx  (Q 1  =>  Q2)  implies  (s',h)  Qx  (Qi  =»  Q2).  Suppose  (s,h)  Qx  (Q 1  =>  Q2). 
Then  by  the  definition  of  |=A-  given  in  Figure  2.7  we  have  (s,  h)  \=x  Q\  implies 
(s,h)  |= x  Q2-  Now,  suppose  (s',h)  [= x  Q i-  Since  fv(Q)  =  fv(Qi)  U  fv(Q2)  and 
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fv(Q)  C  V,  we  have  fv(Q i)  C  V  and  fv(Q2 )  C  V.  This  lets  us  apply  our  inductive  hy¬ 
pothesis,  obtaining  (s,  h )  |=A  <5i-  This  implies  (s,  /r)  \=x  Q2  by  our  assumption,  which, 
applying  the  inductive  hypothesis  again,  gives  us  (s',  h )  \=x  Q>-  Thus,  we  have  shown 
that  (s',  h )  \=x  Q 1  implies  (s',  h )  |=x  Q2,  which  lets  us  conclude  (s',  h )  |=x  (<5i  =>■  C^)- 
The  proof  of  the  backwards  direction  is  the  same,  with  s  and  s'  interchanged. 

CASE  Q  =  3x.  Q':  We  consider  the  forward  direction  first.  The  relation  (s,  h)  \=x  3x.  O' 
implies  there  exists  a  v  such  that  (s[x  — >  v],  h)  \=x  Q' ■  Consider  the  store  s'[x  — >•  v]. 
Since  s  —v  s',  we  have  s[x  — >  v]  =vu{x}  s'[x  — >  v].  We  have  that  fv(Q)  =  fv(Q')  —  {x} 
and  fv(Q)  C  V  which  implies  fv(()')  C  V  U  {a:}.  We  can  then  apply  our  inductive 
hypothesis  to  (s[x  — >  v\,li)  \=x  O',  obtaining  (s'[x  — >  v],h)  \=x  O'.  This  implies 
(s',  h )  |=x  O'.  The  backward  direction  is  the  same,  with  s  and  s'  interchanged. 

CASE  Q  =  Vx.  O':  We  consider  the  forward  direction  first.  The  relation  (s,  h  )  \=x  V.x.  O' 
implies  that  for  all  v  we  have  (s[x  — >■  n],/r)  |=x  Q'.  Consider  an  arbitrary  n'.  In¬ 
stantiating  n  above  with  v1  we  have  (s[x  — >  v'],h)  \=x  Q' .  Since  s  —v  s',  we  have 
s[x  — >  v]  =vu{x}  s'[x  —>  v].  We  have  that  fv(Q)  =  fv(Q')  —  {x}  and  fv(Q)  C  V 
which  implies  fv(Q')  C  V  U  {a:}.  We  can  then  apply  our  inductive  hypothesis  to 
(s[x  — >  v'],h)  \—x  Q',  obtaining  (s'[x  — >•  v'],h)  \=x  Q' ■  Since  v'  was  arbitrary,  we 
conclude  that  for  all  v'  we  have  (s'[x  — >■  v'],  h )  |=A-  Q',  which  implies  (s',  h )  |=x  Vx.  Q'. 
The  backward  direction  is  the  same,  with  s  and  s'  interchanged.  □ 

2.2.2  Defining  Inductive  Pointer  Structures 

We  follow  an  approach  similar  to  Brotherston  [2007]  in  our  treatment  of  inductively- 
defined  predicates.  Pointer  structures  in  our  system  are  described  inductively  using  defini¬ 
tions  of  the  following  form. 

Definition  List  V  ::=  e  |  (pT(xT)  =  Q)  ::  V 

The  symbol  e  represents  an  empty  sequence  of  definitions.  V  then  specifies  a  set  of  mu¬ 
tually  inductive  predicates.  We  require  for  each  definition  pT(xr)  =  Q  that  all  variables 
in  xT  are  distinct,  that  fv(Q)  C  x,  and  that  all  predicates  pT  occurring  to  the  left  of  =  in 
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V  are  distinct.  We  also  do  not  allow  implication  or  universal  quantification  to  appear  in  Q 
(and  recall  that  Q  also  cannot  contain  negated  spatial  predicates  according  to  the  grammar 
in  Figure  2.6). 

As  the  constraints  on  type  and  arity  of  predicates  and  type  and  length  of  argument 
vectors  are  standard  and  generally  clear  from  context,  we  will  henceforth  write  predicates 
and  vectors  without  mentioning  arity  or  length  except  when  necessary  for  clarity.  For 
example,  we  will  write  p{x)  to  represent  pT(xT)  for  some  r  implicitly  given  by  context. 

We  will  write  (p(x)  =  Q)  G  D  when  the  definition  p(x)  =  Q  appears  in  D.  We 
require  that  if  ( p(x )  =  Q)  G  V  and  the  predicate  instance  p'(eT)  appears  in  0  then 
( p'(yT )  =  ()')  G  V  for  some  yT  and  Q'.  This  ensures  that  all  predicates  referenced  in  the 
inductive  definitions  are  defined.  We  write  dom(T> )  to  refer  to  the  set  of  predicates  being 
defined  by  D.  This  is  defined  inductively  as  follows. 

dom((p(x)  =  Q )  ::  V)  =  {p}  U  dom(V) 
dom(e )  =  0 

As  an  example  of  an  inductive  definition,  consider  the  following  definition  of  a  doubly- 
linked  list  segment  with  length  n  starting  at  heap  cel  l  //r.s7  and  ending  at  last.  The  parameter 
prev  records  the  value  of  the  prev  field  of  the  first  cell  in  this  list  and  next  records  the  value 
in  the  next  field  of  the  last  cell. 

dll (n, prev,  first,  last,  next )  = 

emp  A  n  —  0  A  first  =  next  A  last  =  prev 
V  (=h.  ( first  (->•  [prev  :  prev,  next  :  z})  * 

dll(n  —  l,  first,  z,  last,  next))  A  n  >  0 

The  disjunction  indicates  that  there  are  two  possible  cases  for  a  list  segment  with  length  n. 
Either  n  —  0  and  the  list  is  empty,  or  n  >  0  and  there  is  an  allocated  heap  cell  at  the  head 
of  the  list  and  a  separate  tail  of  length  n  —  1. 

The  semantics  of  inductive  predicates  is  defined  in  terms  of  iterated  expansion.  We 
begin  with  the  following  definition. 
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Definition  4.  Let  o(r)  be  the  function  defined  such  that  o( a)  =  Addr  and  o(i)  =  Z.  We 
extend  o  to  vectors,  letting  o{t\  . . .  rn)  =  o(Yi)  x  . . .  x  o(rn). 

We  then  view  an  inductively-defined  predicate  of  arity  t  as  a  function  of  type 
o(f)  — »  2Heaps,  which  maps  values  for  the  parameters  to  the  set  of  heaps  that  satisfy 
the  predicate.  We  will  call  such  a  function  an  interpretation  function  and  define  this  as 
follows. 

Definition  5.  If  N  is  a  set  of  predicate  names,  the  set  of  interpretation  functions  AN  is 
defined  as  follows. 

A N  =  U  ({/}  (o(f)  ->  2^) 

prgAT 

In  the  type  above,  we  use  a  union  over  functions  with  a  singleton  domain  { pT }  to  indicate 
that  the  range  of  the  function  depends  on  the  type  of  t  of  the  argument  pT.  Note  that 

dom(  An)  =  N. 

The  meaning  of  a  list  of  inductively  defined  predicates  D  is  then  an  element  of  the  set 
A dom(v)-  We  devote  the  remainder  of  the  section  to  discussing  appropriate  elements  of 
A dom{v)  to  take  as  the  semantics  of  V. 

Fixed-Point  Semantics 

Let  V  be  the  following  list  of  inductive  definitions 

(Pl(*2T)  =  Q  l) . ( Pn{%n )  =  Qn) 

with  the  arity  of  pi  equal  to  f.  Let  A"  be  an  element  of  Adnm(V).  We  will  write  s[x  — >  v\  for 
the  store  s'  such  that  s' {y )  =  vd  if  y  =  Xi  for  some  i  and  s'(y)  =  s(y)  otherwise.  We  use 
lambda  notation  to  denote  functions  at  the  meta-level  and  write  Xv.  t  as  an  abbreviation 
for  At’i .  Xv2-  ■  ■  ■  Xvn.  t  where  t  is  some  term  in  the  meta-language.  As  always,  we  require 
that  the  types  of  the  x  and  the  domains  from  which  the  v  are  drawn  match,  so  that  if 
Xi  has  type  a  then  vx  e  Addr  (and  similarly  for  i  and  Z).  Let  ux>  be  a  function  of  type 


33 


2  Preliminaries 


Adom(v)  ->  A dom(v)  defined  as  follows. 

uv(X)  =  (J  {(p,  Y)  |  Y  =  Xv.  {h  |  3s.  (s[f  ->■  v\,  h )  |=A-  Q}} 

(p(x)  =  Q)  eV 

Intuitively,  this  operator  corresponds  to  taking  X  as  the  current  approximation  of  the 
meaning  of  the  definitions  in  T>,  and  adding  the  heaps  that  are  satisfied  when  we  expand 
the  definitions  once. 

A  fixed-point  of  ux>  is  any  X  e  Xdom(x>)  such  that  lod(X)  =  X.  Any  fixed-point 
of  ux>  may  be  taken  as  the  meaning  for  a  set  of  inductive  definitions  without  introducing 
inconsistency  into  the  system.  The  tool  that  we  discuss  in  Chapter  5  makes  no  assumptions 
about  which  fixed-point  has  been  chosen,  and  thus  its  conclusions  are  sound  for  all  fixed- 
points.  In  order  to  formalize  this,  we  introduce  the  following  definition  of  satisfaction  with 
respect  to  a  set  of  inductive  definitions. 

Definition  6.  Let  V  be  a  set  of  inductive  predicate  definitions.  Then  we  define  satisfaction 
of  Q  with  respect  to  V  as  follows. 

(s,  h)  [=D  Q  iff  (s,  h )  \=x  Q  for  all  X  e  Xdom{v)  such  that  ojt>(X)  =  X 

This  will  be  the  definition  of  satisfaction  that  we  will  use  throughout  the  thesis  as  it 
most  closely  captures  the  behavior  of  our  static  analysis  tool.  However,  it  is  important  to 
ensure  that  the  universal  quantification  in  the  definition  above  is  not  vacuously  satisfied.  If 
there  are  no  fixed-points  for  then  (s,  h )  \=D  Q  is  trivially  satisfied  for  all  s,  h.  Q,  i.e. 
the  logic  becomes  inconsistent.  We  turn  now  to  this  issue,  showing  that  u>x>  does  in  fact 
always  have  fixed-points.  Furthermore,  these  fixed-points  are  partially  ordered  and  there 
is  always  a  least  fixed-point  with  respect  to  this  ordering. 

Least  Fixed-Points 

We  first  prove  the  following  lemma,  which  states  that  if  the  denotations  of  predicates 
given  by  X'  include  more  states  than  those  given  by  A",  then  satisfaction  with  respect  to  X 
implies  satisfaction  with  respect  to  X' .  The  fact  that  implication  is  not  allowed  in  inductive 
predicate  definitions  is  crucial  for  this  lemma. 
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Lemma  5.  Suppose  X  G  and  X'  G  A  at  for  some  N.  Then 

Vp,  v.  (p  E  N)  =>  X(p)(v)  C  X'(p)(u)  (2.4) 

implies 

Vs,  h.  ((s,h)  Kv  Q)  =>  ((s,/i)  |=X'  Q) 

Proof.  The  proof  is  by  induction  on  the  structure  of  Q. 

CASE  Base  Cases  Not  Involving  Inductive  Predicates:  The  base  cases  not  involving  in¬ 
ductive  predicates  are  Q  =  eb,  Q  =  emp,  and  Q  =  ea  (->•  [p].  In  each  case,  the  satisfac¬ 
tion  relation  does  not  depend  on  the  predicate  meanings  provided.  For  example,  suppose 
Q  =  eb.  Then  we  have  (s,  h)  \=x  eb  which  implies  [eb]  s  =  true.  This  then  implies 
(s,  h)  \=x>  eb,  which  is  our  goal. 

CASE  Inductive  Cases:  Since  we  have  disallowed  implication  in  the  body  of  inductive 
definitions,  the  inductive  cases  all  follow  directly  from  the  inductive  hypothesis.  To  give 
an  example,  suppose  Q  =  ztea.  Q' .  Then  we  have  (s,  h )  |=y  3a;a.  Q'  and  must  show 
(s,  h)  \=x'  3a;a.  Q' .  According  to  the  definition  of  satisfaction  (Figure  2.7)  our  assumption 
implies  that  for  some  v  G  Addr  we  have  (A [A1,  — y  v],h)  \=x  Q ■  Our  inductive  hypothesis 
then  gives  us  (s[a:a  — >  v],h)  \=x*  Q ■  Thus,  we  have  (s[a:a  — >  v],h)  \=x>  Q  for  some 
v  G  Addr  which  implies  ( s ,  h)  \=x>  3a;a.  Q' . 

CASE  Inductive  Predicates:  This  is  the  only  non-trivial  case.  We  have  Q  =  p(e).  Accord¬ 
ing  to  the  semantics  in  Figure  2.7  we  have  that  (s,  h)  \=x  p(e)  implies  h  G  (X(p)(s(e))). 
As  we  have  assumed  that  (s,  h)  \=x  Q  is  only  defined  when  the  predicate  names  appearing 
in  Q  are  in  the  domain  of  X,  we  also  have  that  p  G  dom(X )  which  implies  p  G  N.  We  can 
now  apply  assumption  (2.4)  to  obtain  h  G  (X/(p)(s(e))).  This  implies  (s,  h )  \=x>  p(e), 
which  is  our  goal.  □ 

We  next  show  that  the  following  lemma  holds  of  our  definition  of  u-d-  This  will  serve 
as  the  basis  for  establishing  a  monotonicity  property. 

Lemma  6.  Suppose  X  G  A(inm(-Dj  and  X'  G  A dom{v)-  Then 

\/p,v.  (p  G  dom(X))  =>  X(p)(v)  C  X\p)(v)  (2.5) 
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implies 

\/p,v.  (p  G  dom(V ))  a^»(X)(p)(7;)  C  uv(X')(p)(v) 

Proof.  Assume  X  e  Adom(T))  and  X’  e  Ado7n(V)  and  suppose  we  have 

Vp,  u.  (p  G  dom(X))  =>•  A"(p)(L)  C  A"'(p)(L) 

Let  p  be  an  arbitrary  predicate  name  in  dom(T> )  and  v  be  a  list  of  values.  We  must  show 

ujv(X)(p)(v)  C  ujv(X')(p)(v)  (2.6) 

Expanding  the  definitions  of  up(X)(p)(v)  and  up(X')(p)(v)  we  obtain  the  following, 
where  Q  is  the  body  of  the  definition  of  p  (that  is,  (p(x)  =  Q)  e  V  for  some  x). 

uv{X)(p){v)  =  {h  |  3s.  (s[£  ->■  v\,h)  \=x  Q } 
uv(X’)(p)(v)  =  {h  |  3s.  (s[x  -A  v\,h)  \=x>  Q} 

Given  these  definitions,  equation  (2.6)  is  equivalent  to  the  following. 

{h  |  3s.  (s[£  -A  v\,h)  \=x  Q }  C  [h  |  3s.  (s[f  ->■  v\,  h )  \=x>  Q } 

This  holds  if  and  only  if  the  following  holds  for  all  h. 

(3s.  (s[x  ->■  v\,  h)  \=x  Q )  =>  (3s.  (s[f  ->■  fT] ,  h)  \=x>  Q ) 

This  follows  from  Lemma  5.  We  have  (s[x  — >■  v\,  h )  |=a  <5  f°r  some  s.  By  Lemma  5  and 
our  assumption  (2.5),  we  have 

Vs,  h.  (( s,h )  |=.v  Q)  ((s,h)  \=X'  Q ) 

Applying  the  above  with  s[x  — >•  v\  substituted  for  s  then  gives  us  (s[x  — >  v],  h)  \=x> 
which  implies  our  goal  of  3s.  (s[T  — >  v\,  h )  | —x>  Q. 

A  corollary  of  this  lemma  is  that  u-p  is  monotone  with  respect  to  □,  an  ordering  on 
functions  defined  as  follows. 
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Definition  7.  Let  X\  and  X2  be  elements  in  An  for  some  N.  Then  we  define  the  ordering 
□  as  follows. 

X\  □  X2  iff  \/p,v.  (p  E  N)  =>  Xi  (p)(v)  C  X2(p)(i?) 

The  set  of  names  Ar  will  always  be  clear  from  context,  so  we  do  not  include  it  in  the 
notation  for  the  order  □. 

The  monotonicity  result  is  then  the  following. 

Theorem  2.  If  X  G  A dom(D)  and  X'  G  A dom(V)  and  X  jZ  X'  then  u>v(X)  jZ  lut>( X'). 

Proof.  We  must  show  the  following. 

\/p,v.  ( p  G  dom(V ))  =>■  uv(X)(p)(v)  C  u)V(X')(p)(v) 

Our  assumption  that  A"  jZ  A"'  gives  us  the  following. 

Vp,  v.  (p  G  dom{V ))  =>-  A"(p)(T)  C  X'(p)(i?) 

Applying  Lemma  6  then  yields  our  goal.  □ 

Next,  we  define  the  following  operation  on  sets  of  functions  X \. 

Definition  8.  For  any  set  {A"0,  Xi, . . .}  of  functions  in  An,  let  |_|.  X,  be  defined  as  follows. 

U  Xi  =  {(P>  Ai;.  U  Xfp)(y))  |  p  G  A} 

i  i 

This  operation  gives  the  supremum  of  the  set  {A0,  Xl5 . . .}. 

Theorem  3.  |_|;  Xj  is  the  supremum  of  the  set  {X"0,  X1; . . .}  with  respect  to  the  order  Z. 

Proof.  We  must  show  that  Vi.  X,  jZ  |J  .  Xt  and 

VX.  (Vi.  X,:  Z  X)  =>  |J  X{  Z  X 

i 

or  informally,  that  |J .  Xt  is  an  upper  bound  and  that  it  is  the  least  upper  bound. 
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Upper  Bound  We  first  show  Vi.  Xt  Z  |_|-Xj.  Choose  some  Xr  We  must  show  that 
Xj  Z  |_|?  X®  This  holds  if  Vp,  v.  (p  E  N)  =4*  Xj(p)(v)  C  (Ui  -Xi)(p)(E).  Expanding  the 
definition  of  [_|?;  Xj  and  applying  the  function,  we  have  to  show  the  following. 

Vp,v.  (pE  N)  =>  (Xj(p)(u)  C  |J(Xj(p)(r7))j 

i 

This  holds  since  |J(  Xj(p)(i7)  contains  Xj  (p)  (v)  (there  is  some  i  in  this  union  such  that 
i  =  j  which  guarantees  the  inclusion). 

Least  Upper  Bound  We  now  show  the  following. 

VX.  (Vi.  Xj  Z  X)  =>  |_|Xj  Z  X 

i 

We  consider  some  X"  such  that  (Vi.  Xj  Z  X")  and  show  | _ |?;  Xj  Z  X 

following. 

Wp,v.  (pE  N)  =>  (|JXj)(p)(u)  C  X(p)(v)) 

i 

Our  assumption  (Vi.  Xj  Cl)  implies  the  following. 

Vp,  v.  (p  E  N)  Vi.  Xj(p)(n)  C  X{p){v)  (2.8) 

Expanding  the  definition  of  (J^Xj)  in  (2.7)  and  reducing  the  function  application,  we 
find  that  we  must  show 

Wp,v.  (pE  N)  =>  jJ(Xj(p)(i?))  C  X(p)(y)) 

i 

This  follows  from  (2.8)  and  the  fact  that  IJ^  (Xj(p)(i7))  is  the  supremum  of  the  set 

{Xi (p) (v) ,  X2 (p)  (n) , . . . } .  □ 

That  ut>  is  monotone  with  respect  to  Z  and  [J  is  the  supremum  with  respect  to  Z 
implies  that  cc®  has  a  least  fixed-point. 

Theorem  4.  cu®  has  a  least  fixed-point. 


.  We  must  show  the 
(2.7) 
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Proof.  We  first  note  that  Theorem  3  implies  that  the  lattice  of  interpretation  functions  X  is 
complete.  The  current  theorem  then  follows  from  Lemma  2  and  application  of  the  Tarski 
fixed-point  theorem.  □ 


Continuity  Let  _L  =  {(p,  Xx.  0)  |  p  G  dom(V)}.  Not  only  does  u>v  have  a  least  fixed- 
point,  but  this  fixed-point  is  the  least  upper  bound  of  the  increasing  chain  . . ., 

where  u)lv  for  i  G  N  is  defined  as  follows. 

=  _L 

^v1  =  Uviuip) 

This  is  captured  by  the  following  theorems.  These  all  rely  on  the  fact  that  universal 
quantification  is  not  permitted  in  inductive  predicate  definitions. 

Theorem  5.  u-d  is  continuous. 


Proof.  We  have  shown  that  |_|  is  the  least  upper-bound.  We  must  show  that  uj-d  pre¬ 
serves  least  upper-bounds  of  directed  sets  (the  definition  of  Scott  continuity).  Consider 
a  set  X  of  functions  in  Xdom(p )  such  that  for  all  i,j,  if  Xt  G  X  and  X:]  G  X  then 
3Xk-  Xk  G  X  A  X,  C  Xk  A  Xj  C  Xk  (that  is,  X  is  a  directed  set).  We  must  show 
that  cud(]J  X)  =  | _ |(ccx?(X))  where  cud(X)  =  {ojt>(X)  \  X  e  X}.  Expanding  the  defini¬ 

tion  of  u>v,  we  have  the  following  for  the  left  side  of  the  equality. 

U  {(/A  Y)  I  Y  =  Xv.  {h  I  3s.  (s[£  v\,  h)  |=Ux  Q}} 

{. p{x )  =  Q)  e  V 

The  right  side  becomes  the  following 


U  {( P ,  Y )  |  Y  =  Xv.  {h  |  3s.  (s[£  ->v\,h)  \=x  Q }} 

.  (p(x)  =  Q)  e  P 


x  e  x 


Applying  the  definition  of  |J  (Definition  7),  the  right  side  expands  to  the  following. 

U  {{p,  Y )  |  Y  =  Xv.  1J {h  |  (3s.  {s[x  ^v\,h)  \=Xl  Q )  A  Xi  G  X}} 
(p(x)  =  Q)  G  V  i 
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Continuity  will  then  be  implied  if  we  can  show  the  following  for  all  Q  of  our  restricted 
form  (formulas  not  containing  implication  or  universal  quantification). 

({h  |  3s.  (s[£  ->■  v\,h)  |=ux  =  (\J{h  |  (3s.  (s[f  -A  v\,  h )  |=X;  Q)  A  Xt  G  X}j 

i 

Since  an  element  is  in  the  set  on  the  left  of  the  equality  exactly  when  it  is  in  some  set  being 
unioned  on  the  right,  we  have  that  the  statement  above  holds  if  and  only  if  we  have  the 
following  for  all  h. 

(3s.  (s[x  -»■  v\,  h )  |=u x  Ql'j  aa  (3 Xi  G  X.  (3s.  (s[f  ->  v\,  h )  |=Yi  Q)) 

The  right-to-left  direction  of  the  implication  follows  immediately  from  Lemma  5  and 
the  fact  that  for  all  Xt  G  X  we  have  Xt  jZ  |_|  X. 

We  show  the  left-to-right  direction  by  showing  the  following,  stronger  statement  by 
induction  on  the  structure  of  Q. 

Vs.  ((s[£ -*v\,h)  Hljx  0)  => 

3s'.  (s  =fv(Q)  s')  A  (3 Xi  G  X.  ((s'[f  ->  v\,  h )  |=Xi  Q)^j 

CASE  Base  Cases  Not  Involving  Inductive  Predicates:  The  base  cases  not  involving  induc¬ 
tive  predicates  are  Q  =  eh,Q  =  emp,  and  Q  =  ea  i-G  [p\.  In  each  case,  the  satisfaction  re¬ 
lation  does  not  depend  on  the  predicate  meanings  provided.  For  example,  suppose  Q  =  eh. 
Then  we  have  (s[x  — >■  v\,  h)  (=|jx  eb>  which  is  true  if  and  only  if  [ebJ  (s[x  — >■  v\)  —  true. 
This  implies  (s[T  -G  v\ .  h )  \=xt  eb  for  all  Xu  thus  implying  our  goal  (we  trivially  have 
s  —fv(Q )  s,  which  is  the  other  potion  of  the  goal  formula). 

CASE  Q  —  Qi  *  Q2:  We  assume  that  we  have  the  following. 

(s[£  ->■  v\,h)  hux  Qi  *  Q2 

The  semantics  of  |=yx  then  implies  that  there  exist  heaps  h\  and  h2  such  that 
dom(hi)  (T  dom{li2)  =  0  and  h  =  hi  U  h2  and  (s[x  — >  v\,hi)  |=ux  Q 1  and 
(s[T  — y  A] ,  h2)  hyx  Qi-  Our  inductive  hypothesis  then  gives  us  the  following 

3s'.  (s  —jv(Qi)  s')  A  3 Xi  G  X.  ((s' [a;  — >  n],  hi)  \=  xt  Qi ) 
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and 

3s".  (s  =fv(Q2)  s '")  A  3 Xj  G  X.  ((s"[x  — >  v\,  h2)  \ =Xj  Q2) 

Let  s'  and  s"  be  as  above.  Since  s  =fv(Q1)  s'  and  s  =fv(Q2)  s"  we  can  apply  Lemma  4 
to  the  formulas  above  to  obtain 

3 X*  G  X.  ((s[x  — >  v\,  hi)  \=Xi  Qi) 

and 

3 Xj  G  X.  ((s[x  — >  v\,  h2)  \=Xj  Q2 ) 

Let  Xt  and  Xj  be  the  functions  whose  existence  is  stated  in  the  formulas  above.  Then 
the  assumption  that  X  is  directed  implies  that  there  is  some  Xk  such  that  Xk  G  X 
and  Xi  jZ  Xk  and  Xj  jZ  Xk.  Lemma  5  then  gives  us  (s[x  — y  v\,hi)  \=xk  Q 1  and 
(s[T  — *  A]  ,h  \)  \=Xk  Q\  •  We  can  then  combine  these  and  apply  the  definition  of  |=xfc 
(Figure  2.7)  to  conclude  the  following,  which  is  the  second  conjunct  of  our  goal. 

3Xfc  G  X.  ((s[£  — >■  v\,h2)  \=Xk  Qi  *  Q2) 

The  first  conjunct  of  the  goal  is  s  =jv(Q)  s,  which  is  immediate. 

CASE  Q  =  Qi  A  Q2  and  Q  =  Qi  V  Q2:  These  cases  are  very  similar  to  the  case  above. 
For  Qi  A  Q2,  we  have  the  assumption  below. 

(s[T  — »  v\,  h)  |=yx  Q 1  A  Q2 

Applying  the  definition  of  |=|_jx  gives  us  (s[x  -»  v\,  h )  |=yx  Q 1  and  (s[x  — >  v],  h )  |=|_|X  Q 2- 
Applying  the  inductive  hypothesis  yields  (s'[x  v\ ,h)  (=x,  Q 1  and  (s" [:?  — y  v],  h )  j=x,-  Qi 

where  s  =/„(qi)  s'  and  s  =/„(q2)  s".  Applying  Lemma  4  yields  (s[f  ->•  A] ,  A)  (=,Yi  Q 1 
and  (s[T  — *  A] ,  /A)  |=x3-  Qi-  Let  Xk  be  the  upper  bound  of  Xj  and  Xr  We 
then  have  (s[x  — »  A],  A)  [=Xfc  Q 1  and  (s[x  — >  A] ,  A)  CA,  which  implies 

(s[£  — >■  v\,  h)  \=xk  Qi  A  Q2,  which  is  our  goal. 

For  Q 1VQ2  the  proof  is  similar  except  that  we  only  have  one  of  (s[x  — >  v\,h)  |=yx  Q 1 
or  (s[x  — >  v\,  h)  |=yx  Q 2-  Without  loss  of  generality,  suppose  it  is  (s[x  — >■  v\,  h)  |=yx  Q\ 
that  holds.  We  then  apply  the  inductive  hypothesis,  obtaining  (s' [a;  — y  v\,  h)  \—  x,  Qi  and 
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,s  —fv(Qi)  s'.  Let  s"  be  defined  such  that  s"{x)  =  s'(x)  if  x  G  fv(Q i)  and  s"(x)  =  s(x) 
otherwise.  Consider  some  y  G  fv{Q\  V  Q2).  There  are  two  cases.  If  y  G  fv(Q i),  then  we 
have  s"(?/)  =  s'(y)  and,  due  to  s'  =fv(Q1)  s,  we  also  have  s"(y)  =  s(y).  If  y  fv(Q i) 
then  we  have  s"(?/)  =  s(y)  by  the  definition  of  s".  Thus  we  have  shown  s  =jv{Q)  s".  By 
Lemma  4  we  also  have  (s'' [x  — *  u],  /i)  |=  Yi  Qi-  Thus  we  have  shown  our  goal. 

CASE  Q  =  3 y.  Q\ :  We  first  assume  that  y  is  distinct  from  all  elements  of  x.  This 
can  always  be  made  to  hold  via  o-conversion.  We  have  from  the  semantics  of  existential 
quantification  that  there  is  some  vy  such  that  ((s[T  — >  v\)[y  — >  vy\,h)  t=|jx  Q i-  As  y  is 
distinct  from  all  elements  of  x,  we  have  that  (s[£  — >  v] ) \y  — >  vy\  =  (s[y  — >•  vy])[x  ->  v\. 
We  can  then  apply  our  inductive  hypothesis  with  s  =  s[y  — >  vy\.  This  yields 
3s'.  (s'[x  — y  v],h)  \=Xi  Q\  for  some  Xt  and  s  =fv(Q1)  s'.  By  the  case  for  existentials 
in  the  semantics  of  \=Xi,  this  then  implies  3s'.  (s'[x  —tv\,h)  \ =xt  3 y.  Q i,  which  is  the 
second  conjunct  of  our  goal.  The  first  conjunct,  s  —fv(Q)  s',  is  implied  by  our  assumption 
s  =jv(Q1)  s'  and  the  fact  that/u(Qi)  5  fv(Q). 

CASE  Q  =  p(e):  In  this  case,  we  have  (s[x  — >•  v\,  h)  p(y)-  The  semantics  for  |=yx 
from  Figure  2.7  then  gives  us 


h  G  (| _ |  x(p)([e]  s[x  -A  u]) 

Applying  the  definition  of  | _ |.  this  implies  the  following,  where  X,  G  X. 

h  e  |J(A3(p)([e]  s[x  -A  u])) 

i 

This  implies  that  there  is  some  X3  G  X  such  that  h  G  Xj(p) ([e]  s [x  -A  v]).  Again 
applying  the  semantics  from  Figure  2.7,  we  obtain 

(s[£  ->v\  ,h)  \=Xj  p(e) 

We  clearly  have  s  =fv{Q)  s,  so  introducing  an  existential  on  s  then  gives  us  our  goal.  □ 

Theorem  6.  Let  LN  =  { (p.  Xx.  0)  |  p  G  N}.  Then  _Ljv  is  the  least  element  of  A N  with 
respect  to  □. 
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Proof.  We  will  show  that  for  all  X  in  A  at  we  have  _L/y  jZ  X.  Consider  an  arbitrary 
X  G  Ajv-  Expanding  the  definition  of  □,  we  must  show  that 

Wp,v.  (p  G  N)  =>  LN(p)(v)  C  X(p)(v) 

Suppose  p  G  A  and  choose  an  arbitrary  ?7.  Expanding  the  definition  of  LN,  we  must  show 
0  C  X(p)(v).  But  this  is  immediate  since  0  is  the  least  element  with  respect  to  C.  □ 

Theorem  7.  The  least  fixed-point  of  ux>  is  |J  { ujf  \  i  G  N},  where  uj'‘d  is  defined  as  follows. 

W V  =  -L  dom(V)  (2.9) 

L0l+l  =  U)V(  0Jlv) 

Proof.  This  follows  from  Theorem  6,  Theorem  5,  and  Scott’s  fixed-point  theorem.  □ 

Least  Fixed-point  Semantics  of  Satisfaction  The  benefit  of  the  theory  of  least  fixed- 
points  developed  above  is  two-fold.  First,  it  ensures  that  fixed-points  exist  and  thus  that 
Definition  6  does  not  vacuously  hold.  Furthermore,  least  fixed-points  are  often  taken  as 
the  semantics  of  inductive  definitions.  Rather  than  Definition  6,  we  could  have  introduced 
the  following. 

Definition  9  (Alternate  Satisfaction  Relation).  Let  V  be  a  set  of  inductive  predicate  def¬ 
initions  and  let  lfp(coxi)  l)e  the  least  fixed-point  of  uj-p  with  respect  to  Z.  Then  we  define 
least  fixed-point  satisfaction  ofQ  with  respect  to  inductive  definitions  V  as  follows. 

(■ s,h )  \\=D  Q  iff  (. s,h )  Q 

The  development  in  this  thesis  does  not  depend  on  which  fixed-point  is  taken  as  the 
meaning  of  a  set  of  inductive  predicates  and  could  be  carried  out  with  either  Definition  6  or 
Definition  9.  We  chose  Definition  6  since  it  is  more  general,  in  the  sense  that  (s,  h)  \=D  Q 
implies  (s,  h)  \\=D  Q.  This  ensures  that  all  results  given  in  terms  of  the  satisfaction  relation 
in  Definition  6  also  hold  for  the  definition  of  satisfaction  in  terms  of  least  fixed-points 
(Definition  9). 
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Example  Let  D  be  the  definition  list  containing  the  single  inductively-defined  predicate 
below. 

ls(n,  start,  end )  = 

(emp  A  start  =  end  A  n  =  0) 

V  (n  >  0  A  (3z.  ( start  i-A  [next  :  z])  *  ls{n  —  l,z,  end))) 

Then  lfp(ux>)  is  the  function  that  maps  Is  to  the  following  function  (where  #(S)  represents 
the  cardinality  of  set  S). 

A (n,s,e).  [h  |  #(dom(h))  —n  A 

3ai, . . . ,  an.  s  —  a\  A  e  =  an  A 

(Vi.  1  <  i  <  n  =>•  (a*  e  dom(h)  A  /i(aj)  =  {(next,  a*+i  )}))} 

This  maps  the  tuple  (n,  s,  e)  to  the  set  of  heaps  containing  only  cells  that  are  structured  as 
a  solitary  singly-linked  list  segment  of  length  n.  Examples  of  such  heaps  are  the  empty 
heap  {},  the  singleton  heap  {(s,  {(next,  e) }) }  and  the  heap  below,  which  contains  a  list 
segment  of  length  3  (in  the  set  below,  a0  and  a\  must  be  chosen  such  that  a0,  a\  and  s  are 
all  distinct). 

{(s,  {(next,  a0)}),  (a0,  {(next,  ai)}),  (ai,  {(next,  e)})} 

Defining  Inductive  Predicates  With  Characteristic  Formulae 

An  alternative  to  defining  an  inductive  predicate  symbol  as  above  is  to  describe  it  in  terms 
of  the  properties  it  satisfies.  The  key  property  of  an  inductive  definition  is  that  the  inter¬ 
pretation  of  the  definition  should  establish  an  equivalence  between  the  predicate  and  the 
body  of  the  definition.  In  fact,  we  will  show  in  this  section  that  requiring  the  predicate  to 
satisfy  this  equivalence  is  just  the  same  as  defining  it  via  fixed-points  as  we  did  before.  We 
present  this  alternate  approach  because  it  more  closely  matches  the  reasoning  performed 
by  the  tool  we  have  developed  (which  is  described  in  Chapter  5). 

First  we  define  the  characteristic  formula  associated  with  a  definition.  This  is  the 
equivalence  that  we  expect  the  interpretation  of  the  predicate  to  satisfy. 
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Definition  10.  Let  the  characteristic  formula  of  a  set  of  inductive  definitions  T>,  denoted 
\'D],  be  defined  as  follows. 

\pi(xi)  =  Qi  ::  •  •  •  ::  pn(xn)  =  Qn }  = 

(Vfi.  Pi(fi)  «=>  Qi)  A  ...  A  (Vfn.  pn(fn)  Qn) 


Then  we  can  show  the  following,  which  states  that  the  set  of  fixed-points  of  V  is 
exactly  the  set  of  interpretations  satisfying  the  characteristic  formula  of  V.  Recall  that 
|=  Q  holds  if  and  only  if  (s,  h)  f=  Q  holds  for  all  s,  h. 

Theorem  8.  For  all  s,  h,  V ,  Q,  we  have  (s,  h)  \=v  Q  if  and  only  if(s,  h )  |=.v  Q  holds  for 
all  X  G  A  dom(v)  such  that  \=x  \'D\- 

Proof  We  first  note  that  the  definition  of  (s,  h)  \=v  Q  states  that  (s,  h)  \=xr  Q  for  all  X' 
such  that  uj-D(Xr)  =  X'.  We  can  complete  the  proof  by  showing  that  utx>(X)  =  X  if  and 
only  if  |=x  \V]. 

Let  V  =  pi  (fi)  =  Qi  ::  . . .  ::  pn(xn)  =  Qn.  Then  \V]  is  the  formula  below. 


(Vfi.  Pi{xf)  ^Qj)  A  ...  A  (Vfn.  pn(xn)  Qn) 


Since  we  have  \=x  \F>] ,  this  implies  that  for  all  s,  h  we  have 

(S,  h )  |=.Y  (Vfi.  Pl(xi)  &  Ql)  A  ...  A  (Vf„.  pn{xn)  ^  Qn) 

Applying  the  semantics  of  satisfaction  from  Figure  2.7,  we  then  have  the  following  for 
each  s,  h,  i,  v. 

(s[fi  ->•  v\,  h)  \=x  (pi(xi)  O  Qi)  (2.10) 

We  must  show  that  cc-p(X)  =  X  implies  the  formula  above  for  each  s,h,i,v, 
as  well  as  the  reverse  implication.  We  have  that  ux>(X)  =  X  if  and  only  if 
(u!x>(X)^(pi)(v)  =  X(pi)(v)  for  all  pt  e  dom(V).  Expanding  uj,  in  the  previous  for¬ 
mula,  we  obtain  the  following  for  each  i. 

{h  |  3s.  (s[£j  — >■  v\,  h)  \=x  Qi}  =  X(pi)(v)  (2.11) 
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We  now  show  that  (2.10)  holds  if  and  only  (2. 1 1)  does,  thus  completing  the  proof.  Suppose 
(2.10)  holds.  Then  we  have  (s[xt  — >■  v\,  h)  \—x  Pi{xi )  if  and  only  if  (s[:Fj  — >•  v\,  h)  \=x  Qi- 
Expanding  the  definition  of  satisfaction,  we  obtain  h  G  X f(pj)([xj]  s[xj  — >■  v\)  if  and  only 
if  (s[xt  — >  v],  h )  \=x  Qi  or,  simplifying  further,  the  following. 

h  e  X(pi)(v )  iff  (s[£j  ->•  v\,h)  \=x  Qi 


This  holds  if  and  only  if 


X(pi)(v)  =  {h  |  (s[fi  ->■  v\,  h )  |=Y  Qi} 

To  show  our  goal  (2.11)  we  must  show  that  (s[x*  — >■  v\,h)  \=x  Qi  if  and  only  if 
3s.  (s[xj  — *  v\ ,  h)  \=x  Qi.  The  forward  direction  is  immediate.  The  backward  direction 
follows  from  Lemma  4  and  the  fact  that,  since  Qi  is  the  body  of  an  inductive  definition 
with  arguments  xt,  we  have  fv(Qi )  C  xt.  Since  s[xt  — y  v]  —s.  s'[xi  — >  v]  for  any  s ,  s',  the 
Lemma  allows  us  to  assume  the  existence  of  some  s'  such  that  (s'[xi  — >•  v\,  h)  \=x  Qi  and 
conclude  that  (s[i?j  —tv\,h)  \—x  Qi.  □ 

We  will  see  the  utility  of  this  theorem  when  we  discuss  our  implementation’s  treatment 
of  inductive  predicates  in  Section  5.2. 

Induction  Induction  is  commonly  used  to  prove  properties  of  inductively  defined  struc¬ 
tures.  Least  fixed-points  come  with  a  built-in  induction  principle  based  on  the  construction 
given  in  Theorem  7.  When  working  in  the  context  of  the  satisfaction  relation  given  as  Def¬ 
inition  6,  we  do  not  have  this  principle  available.  However,  we  can  still  use  mathematical 
induction  over  the  naturals  as  a  justification  for  inductive  proofs.  Lor  example,  given  the 
list  segment  predicate  Is  from  our  example  (page  44),  we  can  show  the  following  by  in¬ 
duction  on  n  i . 


\/n1,n2,x,y,z.  ls(ni,x,y)  *  ls(n2,y,z)  =>■  ls(n  1  +n2,x,z ) 


Even  when  there  is  no  parameter  present  that  is  suitable  for  induction,  we  can  still  use 
induction  over  the  size  of  satisfying  heaps  to  prove  properties  of  our  data  structures. 
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2.3  Semantics  of  Programs 

A  program  can  be  viewed  as  defining  a  transition  system.  In  this  section  we  first  give  the 
general  definitions  related  to  transition  systems  and  then  discuss  the  interpretation  of  a 
program  as  a  transition  system. 

2.3.1  Transition  Systems 

Definition  11.  A  transition  system  S  is  a  tuple  (A,  /,  F.  — ■»)  where  A  is  a  set  of  states, 
I  C  A  is  a  set  of  initial  states,  F  C  A  is  a  set  of  final  states,  and  — C  A  x  A  is  a 
transition  relation. 

Each  transition  system  defines  a  set  of  traces,  which  are  sequences  of  states  where 
adjacent  states  are  related  by  the  transition  relation.  We  use  the  following  standard  notation 
for  sequences. 

e  is  the  empty  sequence. 

7  is  a  sequence  consisting  of  one  element — the  execution  state  7. 

If  Tj  and  T2  are  sequences,  then  Tj  T2  is  the  sequence  that  results  from  concatenating 
Tj  and  T2.  If  Tj  is  infinite,  then  7j  T2  =  Tj. 

7  e  T  holds  iff  3Tj,  T2.  T  —  Tj  7  T2. 

len{T)  is  the  length  of  sequence  T.  If  T  is  finite  this  is  the  number  of  elements  in 
T.  If  T  is  infinite,  then  len(T)  =  u. 

T(i )  is  the  ith  element  of  T,  with  the  first  element  given  by  T(0).  This  is  only  defined 
ifO  <  i  <  len(T).  The  last  element  of  a  finite  sequence  T  is  given  by  T(/en(T)  — 1). 

Tn  is  the  trace  obtained  by  discarding  the  first  n  elements  of  trace  T.  That  is,  if 

T  =  70  71  •  •  •  7n-i  T'  then  Tn  =  T' .  If  len{T )  <  n  then  Tn  =  e. 

We  then  define  traces  as  follows. 
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Definition  12.  T  is  a  trace  of  transition  system  (A,  I,  F,  --->)  iff 

1.  len(T)  >  0 

2.  T(o)  e  I 

3.  Mi.  if  0  <  i  <  ( len(T )  —  1)  then  T(i)  --->  T(i  +  1) 

4.  T  finite  implies  T{len{T)  —  1)  e  F. 

We  write  traces(A,  I ,  F, --->)  to  represent  the  set  of  traces  of  the  transition  system 
(A,I,F,—+). 

2.3.2  Programs  As  Transition  Systems 

We  will  now  discuss  how  to  form  the  transition  system  corresponding  to  a  program  P.  We 
first  define  the  transition  relation  associated  with  program  P. 

Definition  13.  Given  program  P,  let  be  the  least  relation  satisfying  the  following. 

1.  Ify i  72  then  71  72 

2.  goto(/,  (s,  h))  ( P(l ),  (s,  /i)) 

This  definition  states  that  the  program  transitions  as  long  as  either  the  current  continu¬ 
ation  can  transition  via  the  relation  or  a  goto(/,  (s,  h))  state  has  been  reached,  in  which 
case  execution  proceeds  from  the  continuation  at  l. 

We  can  now  define  the  interpretation  of  a  program  as  a  transition  system.  Recall  that 
G  is  the  set  of  all  execution  states. 

Definition  14.  We  write  ([P  |  Of)  to  represent  the  transition  system  corresponding  to  pro¬ 
gram  P  with  initial  precondition  Q  0.  Let  I  and  F  be  sets  of  states  defined  as  follows. 

I  =  {goto(/0,  (s,  h))  |  (/ 0  =  initloc(P))  A  (s,  h)  |=  Qo} 

F  =  |final(s,  h)\se  Stores  A  he  Heaps\  U  {error} 

Then  ((P  \  Q0))  =  (G,  I,  F, 
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The  semantics  of  a  program  P  is  then  taken  to  be  the  set  of  traces  produced  by  the 
transition  system  corresponding  to  P. 

Definition  15.  The  meaning  of  program  P  in  initial  state  Q  o  is  the  set  of  traces  given  by 
traces([P  \  Q 0)). 

Note  that  infinite  traces  arise  not  from  execution  at  the  continuation  level,  as  continu¬ 
ations  always  terminate,  but  rather  from  the  execution  of  an  infinite  sequence  of  continua¬ 
tions,  each  of  which  reaches  a  goto  /  statement  for  some  label  l. 

2.3.3  Transitive  Closure  of  Relations 

In  addition  to  the  relations  — »  and  we  will  also  use  their  non-reflexive  transitive 

p 

closures,  defined  as  follows. 

Definition  16.  If  R  is  a  relation  of  type  A  x  A  — >•  Bool  for  some  set  A,  then  the  transitive 
closure  of  R,  written  as  R+  is  the  least  relation  satisfying 

Vo,  b  e  A.  aR+b  (( aRb )  V  (3c  G  A.  aRc  A  cR+b )) 

Thus,  — >+  indicates  the  transitive  closure  of  the  — »  relation,  is  the  transitive 

p  p 

closure  of  etc. 

2.3.4  Deadlock  and  Angelic  Non-determinism 

We  now  consider  how  our  semantics  of  branch  statements  interacts  with  the  program  se¬ 
mantics  just  presented.  In  particular,  we  consider  what  occurs  in  an  execution  state  of  the 
form 

(branch  ei  =>•  ki, . . . ,  en  =>-  kn  end,  (s,  h)) 

where  [e,]  s  =  false  for  all  i.  Such  a  state  cannot  make  any  transitions,  thus  it  could  only 
appear  at  the  end  of  a  finite  trace.  But  this  is  not  permitted,  since  Definition  12  states  that 
the  last  state  in  a  finite  trace  must  be  in  F,  the  set  of  final  states.  Definition  14  specifies  F 
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for  our  programs  and  this  set  does  not  contain  any  execution  states  of  the  form  (k,  (s,  h)). 
Such  a  state  might  be  described  as  stuck  or  deadlocked.  An  important  property  of  our  trace 
semantics  is  that  traces  are  not  allowed  to  contain  deadlocked  states. 

We  will  further  illustrate  this  with  a  concrete  example.  Consider  the  continuation  be¬ 
low. 


k  =  (branch  true  =>  (branch  e\  k\  end),  true  (branch  e2  =>■  fc2  end)  end) 

Suppose  T  is  a  trace  of  a  program  containing  k  and  that  T{i)  =  ( k ,  (s,  h )).  Then  it  must 
be  the  case  that  [ei]  s  =  true  or  [e2]  s  =  true.  Otherwise,  execution  would  get  stuck  as 
neither  (branch  e\  =>  k\  end)  nor  (branch  e2  =>■  fc2  end)  would  be  able  to  transition  from 
memory  state  (s,  h).  And  as  we  just  saw,  such  deadlocked  states  are  not  allowed  to  appear 
in  traces.  Furthermore,  if  [e2]  s  =  false  then  T(i  +  1)  =  (branch  e\  =>■  ki  end,  (s,  h)).  That 
is,  non-determinism  is  resolved  such  that  only  cases  which  do  not  later  cause  execution  to 
deadlock  are  chosen.  Such  a  situation  is  often  described  as  angelic  non-determinism.  But 
why  is  this  the  appropriate  treatment  of  non-determinism  here? 

One  answer  is  that,  in  some  sense,  it  does  not  matter  how  we  choose  to  deal  with  stuck 
branches.  The  source  language  we  actually  consider — the  C  programming  language — 
contains  only  total  branches,  which  are  branches  where  the  disjunction  of  the  branch  con¬ 
ditions  is  equivalent  to  true.  This  ensures  that,  in  the  source  program,  execution  can  never 
get  stuck  at  a  branch  point.  For  any  branch,  there  is  always  a  well-defined  next  state. 

Our  soundness  theorem  will  then  tell  us  that  every  trace  of  the  original  program  cor¬ 
responds  to  a  trace  of  the  numeric  program.  Thus,  the  fact  that  the  numeric  program 
throws  away  deadlocked  traces  does  not  hurt  us,  since  soundness  tells  us  that  those  traces 
were  not  necessary  in  order  to  obtain  an  over-approximation1.  Once  we  have  an  over¬ 
approximation,  this  can  be  used  to  prove  a  variety  of  properties  of  the  original  program,  as 
we  will  see  in  Chapter  3. 

If  it  does  not  matter  for  soundness,  then  why  then  do  we  bother  with  this  interpretation 
of  branches?  The  reason  is  that  the  numeric  programs  we  generate  constitute  an  inter- 

1  For  the  purposes  of  this  discussion,  a  program  P'  is  an  over-approximation  of  a  program  P  iff  the  set 
of  traces  of  P'  contains  the  set  of  traces  of  P.  More  details  are  given  in  Chapter  3. 
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mediate  language  for  communicating  with  an  external  verification  tool  (an  intermediate 
language  that  corresponds  to  the  input  language  of  the  tool).  As  such,  it  makes  sense  to 
leverage  the  full  power  of  this  language  and  include  the  constructs  that  have  proved  to  be 
useful  when  verifying  programs  (and  which  are  thus  supported  by  most  external  verifica¬ 
tion  tools). 

One  such  construct  is  the  “assume”  statement,  which  lets  us  represent — in  the  code — 
properties  that  we  know  to  be  true  at  a  given  program  point.  For  example,  suppose  that, 
from  a  verification  standpoint,  the  only  important  property  of  a  library  routine  foo(x)  is 
that  it  always  returns  a  non-negative  number.  Then  we  can  represent  this  in  the  code  by 
replacing  the  statement  “y  =  foo(x)”  with  “y  :  =  ?;assum e(y  >  0)”.  The  statement 
“assume^  >  0)”  indicates  that  we  should  only  consider  traces  for  which  y  >  0  is  true  at 
this  point,  and  discard  all  other  traces.  Our  branch  statements,  with  the  given  semantics, 
are  similar  in  that  the  continuation  “branch  e\  =>■  ki,  e2  =>  k2  end”  states  that  only  traces 
where  e\  or  e2  are  true  need  to  be  considered.  If  we  have  only  one  condition,  as  in  the 
continuation  “branch  e  =>  k  end,”  then  the  semantics  correspond  exactly  to  our  informal 
description  of  assume(e)  and  we  will  adopt  the  notation  assume(e) ;  k  as  an  abbreviation 
for  branch  e  =>■  k  end. 

In  summary,  since  verification  generally  views  a  program  as  representing  a  set  of  traces 
and  attempts  to  over-  or  under- approximate  those  traces,  having  a  command  in  the  lan¬ 
guage  for  filtering  trace  sets  is  very  useful.  Our  semantics  for  the  “branch  . . .  end”  con¬ 
struct  provides  this.  The  difficulties  that  may  be  encountered  if  one  attempts  to  actually 
implement  such  a  command  are  not  a  concern,  since  the  source  programs  we  consider  do 
not  make  use  of  the  trace  filtering  aspect  of  these  commands. 


2.4  Representing  C  Programs 

The  C  language  syntax  contains  a  number  of  ambiguities  and  corner  cases  as  described 
in  [Necula  et  al.,  2002].  In  our  implementation,  we  use  the  framework  described  in  that 
paper  (CIL)  to  reduce  C  to  a  more  regular  subset  of  the  language.  We  will  not  go  into 
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a  large  amount  of  detail  on  how  CIL  constructs  can  be  translated  into  our  language  (the 
CIL  syntax  is  rather  involved),  but  we  will  address  some  of  the  high-level  issues  that  arise 
when  working  with  code  originally  written  in  the  C  language. 

2.4.1  Control  Flow 

Figure  2.9  shows  how  various  control-flow  constructs  can  be  interpreted.  The  constructs 
considered  in  that  figure  are  all  well-structured,  in  that  they  do  not  contain  jumps  out  of 
loops  or  case  statements  that  fall  through.  Such  irregular  flow-of-control  can  be  dealt  with 
by  asking  CIL  to  convert  break  and  continue  statements  into  explicit  gotos. 

2.4.2  Memory  Operations 

Memory  operations  in  C  are  considerably  more  complex  than  those  permitted  by  the  lan¬ 
guage  in  Section  2.1.  However,  they  can  be  reduced  to  the  simpler  memory  model  that  we 
use  for  our  logic  and  analysis  by  a  number  of  conversions.  In  the  following,  we  will  use  the 
terminology  record  to  refer  to  a  collection  of  values  structured  using  named  fields.  In  C, 
these  same  constructs  are  called  structures  or  structs.  C  requires  that  structure  definitions 
and  types  always  be  proceeded  by  the  struct  keyword.2 

Nested  Records  The  C  language  allows  nested  records,  as  below,  where  ( *out )  indi¬ 
cates  the  dereference  of  the  memory  cell  at  the  address  stored  in  out. 

struct  inner  { 
int  x; 
int  y; 

}  ; 


struct  outer  { 

2There  are  ways  around  this  syntactic  inconvenience,  but  for  clarity  and  consistency,  we  do  not  use  such 
tricks  in  these  examples. 
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int  x ; 

struct  inner  in; 


int  main ( )  { 

struct  outer  *out; 

out  =  malloc ( sizeof ( struct  outer)); 
(*out) . in .x  =  5; 


} 


Such  records  can  be  flattened  to  contain  only  a  single  level  of  fields.  If  there  are  naming 
conflicts,  as  there  are  in  this  example,  then  fields  must  be  renamed  to  avoid  clashes.  Code 
equivalent  to  the  above  that  uses  only  a  single  level  of  record  structure  is  given  below. 

struct  outer  { 
int  x ; 
int  in_x; 
int  in_y; 

}; 


int  main ( )  { 

struct  outer  *out; 

out  =  malloc ( sizeof ( struct  outer)); 
( *out ) . in_x  =  5 ; 


} 


The  code  for  main  in  our  syntax  then  becomes 

out  :=  alloc(x1.  in tx1,  in  y1) ; 

out.in_x  :=  5; 

halt 
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if  (  e  ) 

Cl 

}  else 

C2 

} 

h  ■  C3 

{ 

branch  e  =>  ctrans(ci);  goto  l\, 

=>  ~>e  =>  ctrans(c2);  goto  l\  end 

;  l i  :  ctrans(cs) 

li :  while  ( 

e  )  { 

Cl 

li  :  branch  e  ctrans(ci) ;  goto  ii, 

} 

—>e  ctrans(c2)  end 

C2 

switch ( 

e  )  { 

branch  (e  =  ei)  =k  ctrans(ci);  gotoii, 

case 

ei :  Ci ;  break; 

(e  =  e2)  ctrans(c2) ;  goto  h, 

case 

e2 :  C2;  break; 

case 

Cn  •  c^j ;  break; 

(e  =  e„ )  =k  ctrans(cn) ;  goto  end 

} 

■,  l\  :  ctrans(c) 

li :  c 

Figure  2.9:  Translations  of  C  programs  with  regular  control-flow  into  the  syntax  presented  in  Sec¬ 
tion  2.1.  The  function  “ctransQ”  represents  a  recursive  application  of  these  rules.  We  assume  that 
fresh  labels  (lt)  arc  generated  and  inserted  in  the  C  program  wherever  necessary  to  apply  these 
rules.  Translations  for  atomic  commands  arc  not  given,  but  are  discussed  in  Section  2.4.2. 

If  the  record  is  not  heap-allocated,  but  instead  allocated  on  the  stack,  as  in  the  main 
procedure  given  below,  then  we  can  convert  the  record  fields  to  stack  variables.  For  exam¬ 
ple,  consider  the  code  below. 

int  main ( )  { 

struct  outer  out; 
out . in_x  =  5; 


This  becomes  the  following. 
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int  main ( )  { 

int  out_x; 
int  out_in_x; 
int  out_in_y; 

out_in_x  =  5; 


Translated  into  our  language,  this  corresponds  to 

out_in_x  :=  5;  halt 

Addresses  of  substructures  The  above  tricks  for  nested  records  fail  in  the  presence  of 
the  “address-of”  operator.  For  example,  C  permits  the  following,  which  specifies  a  record 
within  a  record  and  then  uses  “address-of”  (the  “&”  operator)  to  obtain  a  pointer  to  the 
inner  record. 

int  get_x (struct  inner  *in)  { 
return  ( * in )  . x; 

} 

int  main ( )  { 

struct  outer  out; 

int  x  =  get_x ( &out . in) ; 


} 


In  such  cases,  to  perform  a  faithful  translation,  we  have  to  keep  the  record  nesting 
explicit,  using  pointers  to  connect  the  inner  and  outer  records.  In  general,  any  time  a 
component  of  a  record  may  have  its  address  taken,  we  have  to  ensure  that  this  component 
is  allocated  as  a  separate  heap  cell.  Below,  we  give  the  translation  of  the  code  above, 
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including  updated  versions  of  the  structure  definitions.  Note  that  the  inner  structure  is  now 
explicitly  allocated  on  the  heap. 


struct  inner  { 
int  x ; 
int  y; 

}  ; 


struct  outer  { 
int  x; 

struct  inner  *in; 

1  ; 


int  get_x (struct  inner  *in)  { 
return  ( *in) . x; 

} 


int  main ( )  { 

struct  outer  out; 

out . in  =  malloc ( sizeof ( struct  inner)); 
int  x  =  get_x (out . in) ; 


This  can  then  be  translated  to  the  following  code  in  our  system  (where  the  call  to 
get_x  has  been  inlined). 


out_in  :=  alloc(a;1,  y1) ; 
x  :=  outJn.x 
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Pass  by  reference  The  “address-of”  operator  is  also  used  to  get  around  the  call-by-value 
nature  of  C  language  functions.  In  the  following  example,  the  function  adcLfront  uses 
double-indirection  to  update  the  list  pointer  that  is  passed  in  by  the  main  function. 

struct  list  { 

struct  list  *next; 
int  data; 

}; 


void  add_front (struct  list  **lst,  int  v)  { 

struct  list  *temp  =  malloc (sizeof (struct  list)); 
temp->data  =  v; 
temp->next  =  (*lst); 

*lst  =  temp; 

} 

int  main ( )  { 

struct  list  *p; 

p  =  0; 

add_front (&p,  1) ; 
add_front (&p,  2) ; 
add_front (&p,  3)  ; 


} 


For  such  cases,  as  with  nested  records  whose  address  is  taken,  we  have  to  insert  code 
that  lays  out  the  structure  in  memory  and  change  commands  that  access  the  structure  in  a 
way  this  is  consistent  with  the  semantics  of  the  original  code.  The  basic  rule  is  the  same 
as  before:  any  piece  of  memory  that  may  have  its  address  taken  must  be  allocated  as  a 
separate  cell  in  the  heap.  The  code  below  is  the  translation  of  the  code  above.  Only  the 
code  in  main  needs  to  be  changed. 

int  main ( )  { 
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struct  list  **p; 

p  =  malloc ( sizeof ( struct  list  *)); 
*P  =  0; 

add_f ront (p,  1 ) ; 
add_f ront (p,  2 ) ; 
add_front (p,  3); 


In  general,  if  we  have  a  stack  variable  x  of  type  t  whose  address  is  taken,  we  must 
change  the  type  of  x  to  “pointer  to  t .”  At  the  start  of  the  scope  containing  x,  we  allocate  a 
new  heap  cell  and  set  x  to  the  address  of  this  cell.  Commands  that  previously  accessed  x 
are  changed  to  instead  access  *(a;)  (the  dereference  of  x)  and  commands  that  had  the  form 
&x  (address  of  x)  are  changed  to  instead  refer  to  x  directly. 

The  reason  these  rewrites  are  required  is  that,  in  our  memory  model,  all  fields  asso¬ 
ciated  with  a  record  are  always  referred  to  through  a  common  address.  Other  models  are 
possible,  in  which  record  components  are  given  different,  often  related,  addresses.  For 
example,  if  addresses  are  taken  to  be  natural  numbers,  record  components  can  be  laid  out 
sequentially  in  memory.  Such  models  are  sometimes  referred  to  as  field  splitting  models 
(Berdine  [2006])  and,  while  they  enable  easier  treatment  of  record  components  whose  ad¬ 
dress  is  taken,  they  make  it  harder  to  write  a  rule  for  C-style  de-allocation  (where  calling 
free  (x)  causes  the  entire  contiguous  block  starting  at  x  to  be  freed). 

2.4.3  Unhandled  Features 

There  are  a  number  of  C  language  features  that  cannot  be  translated  into  the  program  rep¬ 
resentation  presented  in  Section  2.1.  Pointer  arithmetic  cannot  be  translated,  as  we  have 
adopted  a  type  system  specifically  aimed  at  eliminating  that  feature.  Our  language’s  inte¬ 
ger  variables  also  do  not  match  up  exactly  with  C’s  integers.  Our  integers  are  unbounded 
whereas  in  C  there  are  several  types  of  integer  variable,  each  of  which  can  store  different, 
finite  subsets  of  the  integers.  For  example,  “unsigned  long  x”  declares  x  to  be  a 
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variable  that  can  store  an  unsigned  32-bit  value  (that  is,  a  value  in  the  range  0  to  232  —  1). 
Such  types  could  be  easily  added  to  our  system.  In  addition  to  the  types  a  and  i  that  we 
have  already,  we  would  simply  have  additional  base  types  representing  bounded  integers 
for  which  mathematical  operations  are  performed  modulo  the  range. 

Such  additional  types  do  not  cause  problems,  and  in  fact  are  included  in  our  imple¬ 
mentation.  However,  since  our  focus  is  on  the  type  a  of  addresses  and  the  analysis  of  data 
structures  built  through  pointer  manipulations,  we  omit  these  types  from  the  theory  pre¬ 
sented  here.  Note  that  even  if  we  add  integer  types  corresponding  to  C’s  bounded  integers, 
we  still  must  retain  the  unbounded  integer  type  i.  This  is  needed  because  the  size  measures 
associated  with  data  structures  are  unbounded. 

This  distinction  between  bounded  and  unbounded  integers  must  be  kept  in  mind  when 
choosing  tools  to  apply  to  the  numeric  programs  that  our  algorithm  generates.  Since  our 
numeric  programs  involve  unbounded  integers,  the  tools  we  use  to  analyze  them  must 
support  these.  Otherwise,  we  can  end  up  with  cases  where,  for  example,  we  repeatedly 
cons  onto  a  list,  increasing  the  length  by  one  each  time,  but  due  to  modular  arithmetic  the 
tool  concludes  that  the  list  is  eventually  empty  (length  equal  to  zero). 

Finally,  we  do  not  support  arrays  or  unions.  Verification  of  arrays  has  been  extensively 
studied  [Halbwachs  and  Peron,  2008,  Bozga  et  al.,  2009,  Gopan  et  al.,  2005]  and  most  of 
these  approaches  could  likely  be  incorporated  into  our  analysis  to  provide  some  level  of 
support  for  arrays.  A  straightforward  combination,  such  as  a  direct  product  of  domains 
[Cousot  and  Cousot,  1979]  would  allow  for  tracking  of  heap  properties  and  tracking  of  ar¬ 
ray  properties,  but  would  not  permit  interaction  between  the  two.  However,  in  C  there  are 
many  ways  in  which  arrays  and  the  heap  can  interact — perhaps  more  so  than  in  other  lan¬ 
guages  since  C  considers  arrays  to  be  pointers  and  allows  them  to  appear  in  most  contexts 
where  a  pointer  would  be  expected.  Tracking  such  interactions  is  an  interesting  avenue  of 
future  work,  but  is  outside  the  scope  of  this  thesis. 
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2.5  Generating  C  Programs 

The  end  goal  of  our  analysis  is  to  convert  a  program  in  the  language  given  in  Figure  2.1 
into  another  program  that  only  manipulates  integer- valued  variables  and  which  can  be 
passed  to  a  separate  program  analysis  tool  for  further  checking.  The  program  we  generate 
will  also  be  in  the  language  given  in  Figure  2.1  and  so  we  must  consider  how  we  will 
represent  this  program  in  a  format  that  standard  verification  tools  can  accept.  Most  of  our 
commands  have  standard  analogues  in  C  and  other  imperative  languages.  The  exceptions 
are  non-deterministic  assignment  (x  ?)  and  our  branch  construct. 

The  input  format  for  program  analysis  tools  is  generally  either  some  specific  program¬ 
ming  language,  such  as  C  or  Java,  or  some  form  of  transition  system.  The  details  vary  and 
we  will  not  go  into  the  specific  translations  required  for  each  tool.  Instead,  we  note  that 
we  can  generally  perform  such  translations  provided  that  the  input  language  for  the  tool 
supports  two  basic  features:  non-deterministic  values  and  assume  statements. 

Non-deterministic  Values  Non-determinism  is  often  used  by  analysis  tools  to  abstract 
portions  of  the  code.  For  example,  functions  can  sometimes  be  soundly  abstracted  by 
assuming  that  their  result  is  non-deterministically  chosen.  Suppose  we  are  checking  the  C 
code  below  for  memory  safety. 

a  =  f oo ( )  ; 
if (a  >  0)  { 

int  x  =  malloc ( sizeof  ( int ) ) ; 

*x  =  0; 

} 

else  { 

a  =  a  -  1 ; 

} 


Memory  safety  of  this  piece  of  code  does  not  depend  on  the  value  of  a,  nor  does  it 
depend  on  which  branch  is  taken  (both  branches  are  memory  safe  from  any  starting  state). 
If  we  know  that  foo  does  not  access  the  heap,  then  assuming  that  foo  returns  a  non- 
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deterministically  chosen  value  still  results  in  sound  reasoning  about  memory  safety  and 
allows  us  to  avoid  analyzing  the  body  of  f  oo  (which  may  be  quite  large). 

Because  this  is  a  common  abstraction  technique,  verification  tools  often  expose  the 
ability  to  generate  non-deterministically  chosen  values.  For  example,  Blast  recog¬ 
nizes  the  special  identifier  __BLAST_NONDET,  which  always  represents  a  fresh,  non- 
deterministically-chosen  value.  Systems  without  a  special  non-deterministic  value  often 
interpret  undefined  functions  non-deterministically.  For  example,  in  ARMC,  the  code 
x  =  f  oo  ( )  ;  is  equivalent  to  x  :=  ?  in  our  language  if  the  function  f  oo  is  undefined. 

Assume  Statements  Another  common  feature  is  support  for  assume  statements.  The 
semantics  of  the  sequence  of  statements  assume(e);  c  is  defined  such  that  control  only 
passes  to  c  if  the  expression  e  is  true.  Otherwise,  execution  blocks  or  silently  halts.  The 
effect  of  this,  and  the  source  for  this  statement’s  name,  is  that  it  allows  a  program  analysis 
tool  to  add  the  assumption  e  to  the  current  symbolic  state  before  analyzing  c. 

These  statements  can  be  used  to  model  functions  more  precisely  than  non-deterministic 
values  alone  allow  us  to.  For  example,  if  f  oo  is  known  to  return  a  positive  value  and  not 
modify  the  global  state,  then  the  command  x  :  =  f  oo  ( )  can  be  abstracted  by  the  code 
x  =  nondet;  assume  (x  >  0);  where  nondet  represents  a  non-deterministically 
chosen  value.  Our  semantics  results  in  the  non-determinism  being  resolved  angelically — 
that  is,  a  non-deterministic  value  is  chosen  which  satisfies  the  following  assume  statement. 

Often,  verification  tools  accept  a  version  of  C  that  is  augmented  with  an  assume  state¬ 
ment  that  has  the  semantics  above.  Even  if  assume  is  not  present  in  the  input  language 
explicitly,  the  command 

assume (e) ;  c 

can  be  modeled  as 

if  (e) 

{  c  } 
else 

{  exit  (0) ;  } 
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where  exit  ( 0 )  causes  normal  (non-error)  termination  of  the  program. 

Representing  Branches  These  two  features  combine  to  let  us  faithfully  encode  our 
branch  construct.  If  we  have  the  code  below 

branch  e\  =>■  k\ 
e2  =>■  k2 


en  kn  end 

then  this  can  be  encoded  by  the  following  sequence  of  conditionals,  non-deterministic 
assignment,  and  assume  statements.  We  write  cl  for  the  translation  of  k\,  c2  for  the 
translation  of  k2,  etc. 

a  =  nondet; 
if (a  ==  1) 

{  assume (el ) ;  cl ;  } 
else  if  (a  ==  2) 

{  assume (e2 ) ;  c2 ;  } 

else  if  (a  ==  n) 

{  assume (en) ;  cn;  } 
else 

{  assume ( false) ;  } 

This  encoding  ensures  that  all  valid  paths  through  the  code  will  be  explored.  The 
variable  a  can  take  on  any  value,  and  so  any  sound  analysis  tool  must  explore  each  branch. 
In  each  case,  the  analysis  is  allowed  to  assume  the  condition  for  that  case  (ei,  e2,  etc.). 
The  branch  where  none  of  the  conditions  are  true  is  modeled  with  assume  (false) , 
which  indicates  that  there  are  no  valid  executions  along  this  branch  (and  this  is  exactly  the 
semantics  of  our  branch  construct  in  the  case  where  all  branch  conditions  are  false). 


62 


Chapter  3 

Abstractions  and  Program  Properties 


In  Chapter  2  we  gave  the  semantics  of  programs  in  terms  of  the  traces  produced  by  a 
transition  system.  In  this  chapter,  we  present  the  logic  we  will  use  for  describing  properties 
of  these  traces.  A  common  language  for  describing  properties  of  traces  is  linear  temporal 
logic  (LTL)  [Clarke  et  al.,  1999],  and  the  logic  we  describe  in  the  next  section  is  based  on 
this. 

In  addition  to  presenting  the  logic  we  use  for  stating  program  properties,  we  formally 
define  a  notion  of  program  abstraction  in  this  section.  Roughly,  a  program  P'  is  an  ab¬ 
straction  of  program  P  with  respect  to  some  property  <p  if  whenever  <p  holds  of  P' ,  it  also 
holds  of  P. 

When  setting  up  a  framework  for  program  abstraction,  it  is  common  for  a  program 
and  its  abstraction  to  require  different  numbers  of  executions  steps  to  arrive  at  the  same 
result.  To  take  a  simple  example,  the  command  x  :=  1  and  the  commands  skip;x  :=  1 
both  transition  to  a  state  in  which  x  has  the  value  1,  but  the  second  sequence  requires  two 
steps  to  reach  this  state. 

This  motivates  the  use  of  a  logic  for  program  properties  that  is  not  sensitive  to  the 
number  of  steps  taken  and  the  logic  we  describe  in  this  chapter  has  this  property.  We  also 
present  equivalence  relations  between  traces  that  are  insensitive  to  the  number  of  steps 
taken  and  use  this  notion  of  equivalence  to  formally  define  a  notion  of  program  abstraction. 
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Finally,  we  conclude  by  highlighting  four  specific  program  properties  that  we  have 
focused  on  in  our  experiments. 

The  techniques  used  in  this  chapter  are  tailored  toward  our  semantic  domain  but  are 
based  on  standard  notions  of  stuttering  equivalence,  simulation  and  stuttering  simulation 
[Milner,  1971,  Browne  et  al.,  1988]. 


3.1  LTSL 

In  this  section  we  describe  a  temporal  logic  based  on  LTL\X  [Clarke  et  al.,  1999],  or 
“linear  temporal  logic  without  X  (the  next-time  operator).”  This  logic  supports  the  stat¬ 
ing  of  program  properties  involving  constraints  on  ordering,  necessity,  and  properties  of 
sequences  of  events,  but  does  not  permit  specifications  of  exactly  how  many  steps  are 
involved  in  satisfying  the  property.  The  variant  of  LTL\X  presented  here  differs  from 
standard  LTL\X  in  that  the  atomic  propositions  consist  of  separation  logic  formulae  and 
the  traces  over  which  temporal  formulae  are  interpreted  can  be  finite.  The  resulting  logic 
will  be  referred  to  as  LTSL  (for  “linear  temporal  separation  logic”).  The  syntax  of  the 
logic  is  given  in  Figure  3.1. 

An  atomic  formula  is  either  a  separation  logic  formula  Q,  the  formula  err,  which  rep¬ 
resents  an  error  state,  the  formula  final,  which  represents  a  non-error  final  state,  or  the 
formula  atloc(l),  which  indicates  that  the  current  execution  state  is  associated  with  label 
l.  An  LTSL  formula  is  then  composed  of  these  atomic  formulae  plus  the  temporal  oper¬ 
ators  G.  F,  and  U  and  the  Boolean  operators  A,  V  and  ~,  corresponding  to  conjunction, 
disjunction,  and  negation,  respectively.  We  use  these  symbols  in  order  to  distinguish  the 
connectives  at  the  level  of  path  formulae  from  the  connectives  A,  V,  and  -i  that  were  al¬ 
ready  defined  for  separation  logic  formulae.  We  define  implication  as  a  D  b  if  and  only  if 
~a  V  b. 

The  semantics  of  the  LTSL  constructs  is  defined  in  Figure  3.2.  Recall  that  Tn  is  the 
trace  obtained  by  discarding  the  first  n  elements  of  trace  T  (resulting  in  the  empty  trace 
e  if  T  does  not  contain  at  least  n  elements).  A  separation  logic  formula  holds  at  a  state 
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State  Formulae  ?  ::=  Q  \  err  \  final  \  atloc(l) 

Path  Formulae  0  ::=  ?  |  (f>  A  (j>  \  0  V  0  \  ~0  \  G0  \  F0  \  0  U  0 

Figure  3.1:  Syntax  of  the  logic  LTSL. 

if  the  store  and  heap  at  that  state  satisfy  the  formula.  The  err  and  final  formulas  hold 
of  error  and  final  states  respectively.  The  atloc(l)  formula  holds  if  a  state  is  of  the  form 
goto(/,  (s,  h )).  The  semantics  of  the  path  formulas  involves  reasoning  about  a  sequence 
of  states.  The  formula  Go  holds  if  0  holds  globally — that  is,  it  holds  of  every  suffix  of  the 
sequence.  The  formula  F0  holds  if  0  holds  of  some  suffix  of  the  sequence.  If  we  interpret 
the  sequence  as  a  series  of  points  in  time,  then  G0  says  that  0  holds  at  all  future  points, 
whereas  F0  says  that  0  holds  at  some  future  point.  Note  that  “future”  here  includes  what 
might,  in  common  usage,  be  referred  to  as  the  “present”  (that  is,  it  includes  the  first  state 
in  the  trace).  The  formula  0i  U  02  holds  when  02  holds  at  some  future  point  and  0i  holds 
at  every  point  up  to  (but  not  necessarily  including)  the  point  at  which  02  holds. 

An  LTSL  formula  holds  of  a  transition  system  S  if  and  only  if  it  holds  of  all  traces  of 
S.  The  relation  T  \=x  0  below  is  the  one  given  in  Figure  3.2. 

Definition  17.  Let  S  be  a  transition  system.  Then  S  \—x  0  iff  VT  e  traces(S).  T  \—x  0. 

We  say  that  an  LTSL  formula  0  holds  of  a  program  P  with  initial  states  satisfying  Q0  iff 

(P\Qo))  \=xf. 

LTL\X  is  generally  interpreted  over  infinite  paths.  However,  our  execution  traces  can 
be  finite  and  the  semantics  presented  in  Figure  3.2  provides  for  interpretation  of  LTSL 
formulae  over  finite  paths.  This  interpretation  of  the  LTSL  operators  over  finite  paths 
given  here  is  consistent  with  the  other  common  method  of  accommodating  finite  paths, 
which  is  to  extend  them  to  infinite  paths  by  replicating  the  final  state. 

Note  that,  as  in  the  semantics  for  separation  logic  formulae  given  in  Figure  2.7,  the 
satisfaction  relation  given  here  is  parametric  in  the  set  of  inductive  predicates  X.  All  the 
properties  we  discuss  in  this  section  will  hold  for  any  set  A"  satisfying  the  conditions  given 
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State  Formulae 


7  \=x  err 

iff 

7  =  error 

7  \=x  final 

iff 

7  =  final(s,  h)  for  some  s,  h 

7  \=x  atloc(l) 

iff 

7  =  goto(7,  ( s ,  h ))  for  some  s ,  h 

7  \=x  Q 

iff 

there  exists  s,  h  such  that  (s,  h)  = 

(7  =  (k,  ( s ,  h ))  for  some  k,  or  7  =  final(s,  h),  or 
7  =  goto(7,  (s,  h))  for  some  l) 

Path  Formulae 


T 

\=x 

c 

iff 

len(T)  >  0  and  T(0)  \=x  C 

T 

\=x 

r^(j) 

iff 

T^xfi 

T 

\=x 

v  <f>2 

iff 

T  \=x  fii  or  T  \=x  4>2 

T 

\=x 

fil  A  4>2 

iff 

T  \=x  fii  and  T  \=x  4>2 

T 

\=x 

Gfi 

iff 

\/i.  0  <  i  <  len(T)  implies  Ti 

=x 

T 

\=x 

Ffi 

iff 

3i.  0  <  i  <  len(T)  and  Tj  \=x 

T 

\=x 

fii  u  4>2 

iff 

3i.  0  <  i  <  len(T)  and  Tj  \=x 

4>2 

and  (Vj.  0  <  j  <  i  implies  Tj  \=x  4>  1) 


Figure  3.2:  Semantics  of  LTSL  formulae.  The  notation  T,  denotes  the  suffix  of  T  starting  at  position 
i  (where  the  first  element  has  position  0).  The  satisfaction  relation  for  Q  is  in  Figure  2.7.  We  write 
T  \Ax  <P  to  indicate  that  the  relation  T  \=x  4>  does  not  hold. 

in  Section  2.2.2.  Thus,  all  theorems  given  in  this  section  should  be  considered  universally 
quantified  over  X,  unless  otherwise  specified. 

3.1.1  Notation 

To  facilitate  the  compact  representation  of  execution  states,  we  will  sometimes  label  con¬ 
trol  points  in  continuations  with  numbers  enclosed  in  circles.  We  then  use  each  number  to 
refer  to  the  continuation  starting  at  that  point  in  the  term.  For  example,  the  continuation 
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below  contains  four  numbered  control  points. 

(T)  branch  x  =  0  =>■  (2)  x  :=  x  +  1;  halt, 

x>0=^(3)x:=x  —  1;  (4)  halt  end 

The  numbers  then  represent  the  following  continuations: 

(T)  =  branch  x  =  0  =>  x  :=  x  +  1;  halt, 

x>0=^x:=x  —  1;  halt  end 

(2)  =  x  :=  x  +  1;  halt 

(3)  =  x  :=  x  —  1;  halt 

(4)  =  halt 


(3-1) 

(3.2) 

(3.3) 


3.1.2  Examples 

Consider  the  following  program. 

P,  = 

L0  :  CD x  :=  0;  ©goto  Lx; 

Li  :  (3 }  branch  x<2=^(4)x:=x+l;  (5)  goto  Lx, 
x>2=>(6)x:=0;  ©  goto  Lx 
end 

Below  is  an  example  trace  through  this  system.  We  only  show  the  value  of  variable  x  since 
this  is  the  only  variable  that  appears  in  the  program.  We  start  this  example  trace  in  a  state 
where  x  has  the  value  12.  Similar  traces  would  exist  for  all  initial  values  of  x. 
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goto(Lo,({(x,  12)},  {})) 
(®,({(x,12)},  {})> 
(®,({(x,0)},  {})> 
goto(Llt({(x,0)},  {})) 
<®({(x,0)},  {})> 
(®,({(x,0)},  {})> 

<©.({(».  1)},  {})> 
goto®,  ({(x,  1)},  {})) 
(®,({(x,l)},  {})> 
(®,({(X,1)},  {})> 
<©,({(x,2)},  {})> 
goto(Llt  ({(x,  2)},  {})) 
<®,({(x,2)},  {})> 
<©,({(x,2)},  {})> 
(®,({(x,0)},  {})> 
goto(Llt  ({(x,  0)},  {})) 


We  will  now  state  some  properties  satisfied  by  this  trace.  First,  it  does  not  terminate. 
This  corresponds  to  the  LTSL  formula  ~(F (final  V err)).  It  also  visits  location  Li  infinitely 
often.  This  corresponds  to  the  formula  G(F(ofioc(Li))).  Note  that  the  formula  G(F(®)) 
does  not,  in  general,  guarantee  that  holds  infinitely  often.  It  can  also  be  satisfied  by  finite 
traces  ending  in  a  state  satisfying  ©  This  means  that  our  example  formula  G(  F(  atloc  (Li))) 
would  also  be  satisfied  by  any  finite  trace  ending  in  a  state  of  the  form  goto(l_!,  (s,  h)). 
However,  such  traces  are  ruled  out  by  the  semantics  of  programs  given  in  Definition  14. 
Since  the  state  goto(/,  (s,  h))  can  always  make  a  transition,  it  is  not  allowed  to  be  the  final 
state  in  a  trace. 
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Finally,  at  label  Li  in  the  example  program,  x  is  always  less  than  or  equal 
corresponds  to  the  formula  G(atloc(Li)  3  x  <  2).  All  of  these  properties  are 
all  traces  of  the  program  and  thus  hold  of  the  transition  system  ((P\  |  true)). 

As  a  second  example,  consider  the  program  below. 

P2  = 

L0  :  x  :=  nil;  a  :=  0;  goto  Li; 

Li  :  branch  true  t  :=  alloc(next);  t.next  :=  x; 

x  :=  t;  a  :=  a  +  1;  goto  l_i, 
true  =>■  halt 
end 

This  program  satisfies  the  property  G(atloc(L1)  D  ls( a,  x,  nil)),  where  ls(a,x,  nil)  is 
the  predicate  defined  below,  which  states  that  there  is  a  list  of  length  a  starting  at  memory 
address  x. 

ls(n,  start,  end )  = 

(emp  A  start  =  end  A  n  —  0) 

V  {n  >  0  A  (3z.  ( start  i-A  [next  :  z])  *  ls(n  —  1,  z,  end))) 

It  is  also  the  case  that  every  trace  either  visits  location  Li  infinitely  often,  or  the  trace  termi¬ 
nates  in  a  state  final(s,  h).  This  corresponds  to  the  property  F (final)  V  G(F(ai/oc(l_i))). 

3.1.3  Core  Connectives 

Not  all  the  connectives  defined  in  Figure  3.2  need  to  be  considered  primitive.  Many  can 
be  defined  in  terms  of  other  connectives.  The  following  list  of  connectives  is  sufficient  to 
define  the  others. 

A  ~  U 

The  following  theorem  shows  how  to  define  the  other  connectives  in  terms  of  these.  In  the 
following,  we  write  <f>  AA  0'  as  shorthand  for  VT.  ( T  \=x  O)  iff  (T  \=x  <t>'). 


to  2,  which 
satisfied  by 
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Theorem  9. 

0  1  V  02  AA  ~(~01  A  ~02) 

F0  AA  true  U  0 

G 0  AA  ~(F(~0)) 


(3.4) 

(3.5) 

(3.6) 


Proof.  Equivalence  1:  0i  V  02  AA  ~(~0i  A  ~<j>2 ) 

Suppose  we  have  a  trace  T  and  T  \=x  0i  V  02.  Then  either  T  \=x  0 i  or  T  \=x  02- 
Without  loss  of  generality,  suppose  it  is  T  \=x  0i  that  holds.  Then  T  \=x  ~0i 
does  not  hold  and  thus  T  \=x  (~0X)  A  (~0 2)  does  not  hold.  But  this  means  that 
T  |=  v  ~((~0i)  A  (~02))  does  hold,  thus  establishing  the  forward  direction  of  the  equiva¬ 
lence. 

For  the  backward  direction,  assume  that  ~(~0i  A  ~02)  holds  of  T.  Then  (~0i  A  ~02) 
does  not  hold  of  T.  This  implies  that  either  ~0i  or  ~02  does  not  hold.  Without  loss  of 
generality,  assume  it  is  ~0i  that  does  not  hold.  Then  d)\  does  hold,  which  implies  that 
0i  V  02  does  hold  of  T. 

Equivalence  2:  F0  aa  true  U  0 

Suppose  T  (=  \-  F0  for  an  arbitrary  T.  Then  by  the  semantics  in  Figure  3.2  we  have 
that  there  is  an  i  satisfying 


0  <  %  <  len(T )  and  T%  \=x  0 


We  must  show  the  following 

3 il .  0  <i'<  len(T )  and  T[  \=x  0  and  Vj.  0  <  j  <  1  implies  Tj  \=x  true 

We  let  i'  be  i.  Our  assumption  on  i  tells  us  that  the  formula  0  <  i'  <  len(T )  is  satisfied, 
as  is  Ti  \=x  0.  All  that  remains  is  to  show 

Vj.  0  <  j  <  i  implies  Tj  \—x  true 

Since  j  <  i!  and  i'  <  len(T )  we  have  that  j  <  len(T)  —  2  and  thus  the  trace  Tj  contains  at 
least  two  states.  This  implies  that  7}( 0)  cannot  be  the  final  state  in  the  trace  Tj.  This  fact 
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ensures  that  Tj( 0)  has  either  the  form  (k,  ( s ,  h))  or  goto(/,  (s,  h )).  In  either  case,  we  have 
PM  o)  |=  y  true)  and  thus  (Tj  \=x  true).  Since  j  was  arbitrary,  we  have  this  for  all  j. 

For  the  reverse  direction,  suppose  that  ( T  \=x  true  U  <p)  holds.  Then  we  have 

3i.  0  <  i  <  len(T )  and  Tj  \=x  <fi  and  Vj.  0  <  j  <  i  implies  Tj  \—x  true 
But  this  implies 

3i.  0  <  i  <  len(T)  and  Tj  \=x  (j) 

(we  have  simply  dropped  the  last  conjunct).  This  is  the  semantics  of  F(j>. 

Equivalence  3:  Go  oo  ~(F(~0)) 

Suppose  we  have  G (f>.  Then  by  the  semantics  of  LTSL  (Figure  3.2)  we  have 

Vi.  0  <  %  <  len(T )  implies  T;  |=x  4>  (3.7) 

We  must  show  that  F(~0)  does  not  hold.  The  proof  is  by  contradiction.  Suppose  F(~0) 
did  hold.  Then  there  would  exist  a  j  with  0  <  j  <  len{T)  such  that  T,  \=x  This 
implies  that  Tj  \=x  0  does  not  hold.  But  by  (3.7)  we  have  that  Tj  \=x  (f>  does  hold, 
leading  to  a  contradiction. 

For  the  backward  direction,  suppose  that  ~(F(~0))  holds.  Then  we  have  that  the  fol¬ 
lowing  does  not  hold 

3i.  0  <  i  <  len(T)  and  Tj  \=x 

This  is  equivalent  to  saying  that  the  following  formula  does  hold 

Vi.  — 1(0  <  i  <  len(T ))  or  Tj  \/=x 

Expanding  the  semantics  of  ~,  this  is  equivalent  to 

Vi.  -i(0  <  i  <  len(T))  or  Tj  \=x  (j) 

If  we  now  pick  an  arbitrary  j  and  suppose  that  0  <  j  <  len(T),  then  the  assumption  above 
tells  us  that  Tj  |=_y  <t>  must  hold.  Thus  we  have 

Vj.  0  <  j  <  len(T)  implies  Tj  \=x  (j) 

which  is  the  definition  of  T  \=x  G(f>.  □ 
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3.2  Stuttering  Equivalence 

We  consider  traces  equivalent  up  to  repeated  states  or  stuttering.  We  use  a  definition  of 
stuttering  based  on  that  in  [Manolios,  2001]  and  [Martf-Oliet  et  al.,  2008].  To  formally 
define  stuttering,  we  first  define  what  it  means  for  traces  to  match  according  to  an  equiva¬ 
lence  relation  E. 

Definition  18.  IfT  and  T'  are  traces,  we  write  matches(T,  T' ,  a ,  (3,  E )  iff  E  is  an  equiv¬ 
alence  relation  on  states  and  a  and  f3  are  strictly  increasing  functions  a,  (3  :  N  — >•  N  with 
a(0)  =  /3(0)  =  0  such  that,  for  all  i,j,k  E  N, 

a(i)  <  j  <  a(i  +  1)  and  (3{i)  <  k  <  (3{i  +  1) 

implies 

(j  <  len{T )  k  <  len{T '))  and  (j  <  len(T)  =>■  (T(j))  E  ( T'(k ))) 


The  functions  a  and  f3  partition  the  traces  into  matching  segments.  The  condition  that 
(j  <  len{T ))  (k  <  len(T '))  ensures  that,  if  the  traces  are  both  finite,  then  the  final 

segment  of  T  matches  the  final  segment  of  T' .  It  also  ensures  that  if  the  final  segment  of 
T  ends  at  a(i)  then  a(i)  =  len(T)  and  j3(i)  =  len(T').  In  essence,  this  states  that  there  is 
no  segment  that  “straddles”  the  end  of  either  trace. 

We  can  now  define  stuttering  equivalence  of  traces  with  respect  to  an  equivalence 
relation  E. 

Definition  19.  Two  traces  T  and  T'  are  E-stuttering  equivalent,  written  T 

rsJ  E  V,  iff 

3a,  f3.  matchesiT ,  T',  a,  (3,  E). 


If  two  traces  match,  there  is  always  a  canonical  a,  f3  that  witness  this.  The  canonical 
matching  function  for  trace  T,  written  Br,  is  defined  below. 
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Definition  20.  Given  a  trace  T,  let  Bt  be  the  strictly  increasing  function  of  type  N  — >  N 
defined  as  follows. 

Bt(Q)  =  0 

' 

the  least  j  such  that  j  >  BT(i)  A  ((T(j))  E  ( T(BT(i )))) 
if  such  a  j  exists 

Br(i  +  1)  =  len(T )  if  no  such  j  exists  and  BT(i )  <  len(T ) 

and  T  is  finite 

Bt(i)  +  1  otherwise 

The  function  Bt  divides  T  into  blocks  such  that  all  elements  within  the  same  block  are 
related  by  E  and  these  blocks  have  maximum  size.  If  T  is  finite,  the  last  of  these  blocks 
ends  at  len{T).  If  T  is  infinite,  either  the  first  case  of  the  definition  will  apply  infinitely 
often,  or  we  will  eventually  reach  some  tail  consisting  of  elements  that  are  all  Tf-rclatcd. 
If  this  happens,  then  the  third  case  of  the  definition  applies  and  Bt  begins  counting  up  by 
one  at  each  step.  Note  that  BT  is  clearly  strictly  increasing.  For  each  case  of  the  inductive 
definition,  we  have  that  Bt{i  +  1)  >  Bt{i). 

The  following  theorem  then  states  that  if  a  match  exists,  the  matching  functions  can  be 
replaced  with  the  canonical  matching  functions  for  the  two  traces. 

Theorem  10.  If  matches(T,T' ,a,  (3,  E)  then  matches (T,  T’ ,  BTl  BT',  E). 

Proof.  We  have  BT{ 0)  =  0  and  BTf  0)  =  0  from  the  definition  of  B.  This  is  one  condition 
for  matches(T,  T Bt ,  Bt>,  E).  To  complete  the  proof,  we  must  show  that  the  following 
holds  for  an  arbitrary  i,.j.  k. 

BT{i)  <  j  <  BT{i  +  1)  and  BT>{i)  <k<  BT>(i  +  1) 

implies 

(j  <  leniT )  ^k<  len(T '))  and  (j  <  len(T )  =>  (T(j))  E  ( T\k ))) 

Let  i,j ,  k  be  as  above.  We  then  case  split  on  the  case  of  Definition  20  that  was  used  to 
define  BT(i  +  1). 
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CASE  1  [First  or  second  case  of  Definition  20  was  used  for  Bt(i  +  1)]  In  this  case,  we 
can  establish  the  following,  which  states  that  if  a  block  of  BT  ends  at  some  index,  then 
there  is  also  a  block  of  a  that  ends  at  that  index,  and  similarly  for  Br>  and  /3.  Furthermore, 
if  it  is  the  rth  block  of  a  that  coincides  with  Bt,  then  it  is  also  the  rth  block  of  (3  that 
coincides  with  BT'. 

Mq  G  N.  q  <  i  +  1  =>•  3r  G  N.  BT(q)  =  a(r )  A  BT>(q)  =  f3(r)  (3.8) 

Proof.  We  show  this  by  induction  on  q.  The  0  case  is  straightforward.  We  let  r  =  0.  Since 
Bt( 0),  BTi( 0),  a(0),  and  /3(0)  are  all  equal  to  0,  we  have  the  equalities  in  the  conclusion 
immediately. 

For  the  inductive  case,  we  assume  that  there  exists  some  r  such  that  BT(q )  =  a(r) 
and  Bxfq)  =  f3(r)  and  we  show  there  exists  some  s  such  that  Bx(q  +  1)  =  a(s)  and 
BT>(q  +  1)  =  /3(s)  provided  q  +  1  <  i  +  1. 

Showing  Br(q  +  1)  =  a(s)  We  have  q  +  1  <  i  +  1,  which  implies  q  <  i.  Since 
Bx(q  +  1)  was  defined  by  either  the  first  or  second  case  of  Definition  20,  we  also  have  that 
either  there  is  some  next  block  of  elements  not  related  by  E  to  those  at  BT(q)  or  BT(q) 
marks  the  start  of  the  last  block  of  Tf-rclatcd  elements  in  a  finite  trace.  Since  a  is  strictly 
increasing,  there  is  some  s  such  that  a(s)  <  BT(q  +  1)  <  a(s  +  1).  If  a(s)  =  BT(q  +  1) 
then  we  have  shown  the  first  conjunct  of  our  goal.  We  will  show  that  in  the  other  case  we 
obtain  a  contradiction.  Suppose  a(s)  <  BT(q  +  1).  Then  we  have 

a(s)  <  Bx(q  +  1)  —  1  <  Bx(q  +  1)  <  a(s  +  1)  (3.9) 

and  thus,  because  we  have  matches(T ,  T',  a,  f3,  E ),  we  know  that  the  following  holds. 

T(BT(q+l)-l)E(T(BT(q  +  l))) 

This  contradicts  the  maximality  of  block  q  of  Bt  if  +  1)  is  the  index  of  the  next 
block  that  is  not  E- related  to  T(BT(q ))  (that  is,  if  BT(q  +  1)  is  defined  via  the  first  case 
in  Definition  20).  If  Bx{q  +  1)  =  len{T)  (that  is,  if  Bx(q  +  1)  was  defined  via  the  second 
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case  in  Definition  20),  then  we  case  split  on  whether  a(s)  =  len(T).  If  it  does,  then  we 
are  done,  as  BT(q  +  1)  =  len(T )  and  thus  BT(q  +  1)  =  a(s).  If  it  does  not,  then  we  again 
have  (3.9).  Because  matches (T ,  T',  a,  (3,  E)  holds,  this  implies  BT(q  +  1)  —  1  <  len(T) 
if  and  only  if  BT(q  +  1)  <  len(T).  But  this  cannot  be  since  BT(q  +  1)  =  len(T). 

Showing  BT/(q  +  1)  =  f3(s)  To  show  that  (3(s)  =  BT\q  +  1),  we  note  that  we 
have  a{r)  =  BT(q)  and  a(s)  =  BT(q  +  1).  This  implies  that  there  are  s  —  r  blocks 
of  a  which  correspond  to  the  single  block  of  Bt  from  q  to  q  +  1.  Because  we  have 
matches(T ,  T' ,  a,  f3,  E ),  each  of  these  blocks  of  a  must  match  the  corresponding  block 
of  (3.  This  implies  Vx.  f3(r)  <  x  <  f3(s)  =>•  T'(f3(x))  E  T'(f3(r)).  To  show  that 
BT/(q  +  1)  =  (3(s),  we  must  show  that  this  segment  from  f3(r)  to  f3(s)  constitutes 
a  maximal  block  of  77-rclatcd  elements  in  V .  We  already  have  that  the  elements  are 
E- related.  To  see  that  it  is  maximal,  first  note  that  one  of  the  first  two  cases  of  Def¬ 
inition  20  were  used  to  define  BT.  From  this,  we  have  that  either  a(s)  =  len(T ) 
or  -i (T(a(r))  E  T(a(s))).  Due  to  matches (T,T' ,  a,  f3,  E),  this  implies  that  either 
f3(s)  =  len(T')  or  -i (T\/3(r))  E  T'(/3(s))).  In  either  case,  we  have  a  maximal  block 
of  /^-related  elements  in  T1  and  so  the  definition  of  Br>  ensures  BT'(q  +  1)  =  /3(s).  □ 

We  now  return  to  the  proof  of  the  following. 

Bt{i)  <  j  <  BT{i  +  1)  and  BT*(i)  <k<  BTi{i  +  1) 

implies 

(j  <  len{T )  ^  k  <  len{T '))  and  (j  <  len{T )  =>  (T(j))  E  ( T\k ))) 

We  first  show  the  requirement  that  elements  in  the  same  block  be  E- related  (the  second 
conjunct  in  the  consequent).  Suppose  BT(i)  <  j  <  BT(i+ 1)  and  BT>{i)  <k<  BT'(i+l). 
We  have  from  (3.8)  that  there  exists  some  r  such  that  BT(i)  =  a{r )  and  BTi(i )  =  /3(r). 
From  matches (T,  V ,  ct,  j3,E)  we  then  have  T(a(r ))  E  T'(/3(r ))  and  thus  we  have 
T{BT{i ))  E  T'{Bt'{i)).  Since  BT(i  +  1)  is  the  first  index  s  such  that  s  >  BT{i )  and 
either  - <(T(BT(i ))  E  T(s ))  or  j  =  len(T),  we  have  that  T{BT{i))  E  T(j )  for  all  j  such 
that  BT(i )  <  j  <  BT(i+ 1).  Similarly,  since  BT>  (i+1)  is  either  len(T ')  or  the  index  of  the 
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first  element  after  Br'(i)  in  T'  that  is  not  /^-related  to  Bt'(i),  we  have  T\Bt'(i))  E  T'{k ) 
for  all  k  satisfying  BTi(i )  <  k  <  Bt>(i  +  1).  Since  E  is  an  equivalence  relation  and 
T(BT(i))  E  T'(BTi(i)),  this  gives  us  T(j)  E  T(k )  as  desired. 

For  the  length  requirement,  we  have  that  either  the  first  or  second  case  of  the  definition 
of  BT(i  +  1)  applies,  implying  that  either  BT(i  +  1)  <  len(T)  or  BT(i  +  1)  =  len(T). 
In  either  case,  for  any  j  with  Bx(i)  <  j  <  Bx(i  +  1)  we  have  j  <  len(T).  It  remains 
to  show  that  for  k  satisfying  BTt{i)  <  k  <  BTi(i  +  1)  we  have  k  <  len(T').  From  (3.8) 
we  have  that  there  is  some  r  such  that  BT(i  +  1)  =  a(r)  and  BTf(i  +  1)  =  f3(r).  This, 
together  with  matches(T,  T',  a ,  /3,  E)  and  a(r)  <  len(T)  implies  that  /3(r)  <  len(T)  and 
thus  Bri(i  +  1)  <  len(T'),  which  implies  k  <  len(T')  as  required. 

CASE  2  [Third  case  of  Definition  20  was  used  for  Bt(i  +  1)]  In  this  case,  we  have  that 
BT(i)  is  some  point  along  an  infinite  tail  of  T  where  all  elements  are  Tf-rclatcd.  Let  i'  be 
the  first  element  in  this  tail,  which  is  necessarily  less  than  or  equal  to  Br(i).  Either  i’  —  0 
or  there  is  some  block  of  L -related  elements  prior  to  this  infinite  tail.  We  consider  each 
case  separately. 

CASE  i!  =  0:  In  this  case,  T  consists  entirely  of  an  infinite  sequence  of  elements  that 
are  E’-related.  Since  we  have  matches (T,  T' ,  a,  f3,  E),  this  implies  that  T'  is  an  infinite 
sequence  of  elements  such  that  for  all  x,  x'  we  have  T(x)  E  T'(x').  Given  such  a  situation, 
it  trivially  follows  that  for  our  j  and  k  we  have  T(j)  E  T'{k). 

CASE  i'  >  0:  In  this  case,  there  is  some  block  of  T  prior  to  the  infinite  tail  of 
E- related  elements.  Let  Bt(x)  mark  the  start  of  this  block.  Since  %'  >  Bt(x )  and 
~^(T(Bt(x))  E  T(i')),  we  have  that  the  first  case  of  Definition  20  must  have  been  used 
when  defining  Bt(x).  Thus,  CASE  1  applies  to  Bt(x),  as  does  (3.8).  That  is,  we  have  the 
following. 

Wq  G  N.  q  <  x  +  1  3r  G  N.  BT(q )  =  a(r)  A  BTr(q )  =  /3(r) 

This  implies  that  there  is  some  r  such  that  BT(x  +  1)  =  a(r )  and  BTf{x  +  1)  =  /3(r). 
This  plus  matches (T,  V ,  a,  f3,  E)  implies  that  T(Bt(x  +  1))  E  T'(BTr(x  +  1)).  Since 
B  f  ( x )  marks  the  start  of  the  block  just  before  the  infinite  tail,  Br(x  +  1)  marks  the  start 
of  the  infinite  tail  (and  so  we  have  i!  =  BT{x  +  1)).  Since  BT(x  +  1)  =  a(r)  and 
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matches(T,  T',  a ,  (3,  E ),  it  must  be  the  case  that  f3(r),  which  is  equal  to  BTfx  + 1),  marks 
the  start  of  an  infinite  tail  of  77-related  elements  in  T' .  From  T(Bt(x+1))  E  T'(BTi(x+ 1)), 
it  follows  that  for  all  y  >  BT(x  +  1)  and  for  all  z  >  BT>(x  +  1),  we  have  T(y)  E  T'(z). 
Thus,  we  will  have  our  result  (that  T (j )  E  T(k ))  if  we  can  show  that  j  >  Bt(x  +  1)  and 
k  >  BTfx  +  1). 

Since  i'  =  BT(x  +  1),  and  we  have  i'  <  i,  we  have  BT(x  +  1)  <  BT(i).  Since  BT  is 
strictly  increasing,  this  implies  x  +  1  <  i.  Since  BTf  is  strictly  increasing  we  then  have 
BTi(x  +  1)  <  BTi(i).  Since  j  >  BT(i )  and  k  >  BTfi )  we  then  have  our  result. 

For  the  length  requirement,  we  have  in  both  cases  that  T  is  infinite  and  thus,  because 
of  matches{T ,  T' ,  a,  /3,  E),  V  is  also  infinite.  So  the  j  <  len{T )  k  <  len(T )  conjunct 
of  our  goal  holds  trivially  since  len(T )  =  len(T')  —  to.  □ 

The  relation  is  symmetric,  reflexive,  and  transitive.  These  properties  result  from 
the  following  properties  of  matches. 

Lemma  7.  The  following  three  statements  hold  of  the  matches  relation. 

matches(T ,  T' ,  a,  (3,  E)  matches (T7 ,  T,  (3,  a,  E) 

matches(T,  T,  Xx.  x,  Xx.  x,  E) 

matches(T,  T' ,  a,  a' ,  E)  A  matches (T' ,  T" ,  a' ,  a" ,  E)  matches{T ,  T",  a,  a" ,  E) 

Proof.  Recall  that  E  is  an  equivalence  relation.  The  first  property,  symmetry,  follows  from 
the  fact  that  the  definition  of  matches  is  symmetric  in  T,  a  and  T1,  [3.  The  second  property, 
reflexivity,  is  proved  as  follows.  Both  a  and  (3  are  the  identity  relation,  so  T  is  partitioned 
by  a  (resp.  (3)  into  blocks  consisting  of  a  single  element.  Thus,  we  must  establish  that  for 
any  ieNwe  have  i  <  len{T)  i  <  len(T )  and  i  <  len{T)  (T(i))  E  (T(i)).  The 
first  property  is  a  tautology  and  the  second  follows  from  the  fact  that  E  is  an  equivalence 
relation  and  thus  is  reflexive. 

For  the  third  property,  transitivity,  we  have  a(0)  =  a'fO)  =  0  and  o'fO)  =  a" if))  =  0, 
thus  a(0)  =  a"(0)  =  0.  This  is  the  first  part  of  the  definition  of  matches.  For  the  second 
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part,  we  have  the  following 

Vi,  j,  k.  ^a(i)  <  j  <  oi(i  +  l)j  A  [o/ {i)  <  k  <  a'(i  +  l)j  =>• 

(j  <  len{T)  ^k<  len{T '))  A  (j  <  len{T)  =>•  (T(j))  E  (: T\k ))) 

Vi,  j,  k.  (a\i)  <  j  <  a'(i  +  l)j  A  (oi"(i)  <  k  <  a"(i  +  l)j  =^- 
(j  <  len(T')  ^k<  len{T")}  A  (j  <  len{T )  =»  (T'(j))  E  (: T"(k ))) 

and  we  must  show  the  following 

Vi,  j,  /c.  ^a(i)  <  j  <  a{i  +  1)  j  A  (oi"{i)  <  k  <  a"{i  +  1)  j  => 

(j  <  len(T)  ^k<  len(T "))  A  (j  <  ten(T)  =>  (T(j))  E  (T"(lfe))) 

The  following  derivation  establishes  this. 

1  Vi,  j,  k.  a(i )  <  j  <  a(i  +  1)  A  a'(i)  <k<  a'(i  +  1) 

((j  <  len(T ))  ^  (k  <  len(T ")))  A  (j  <  len(T)  =>  T(j)  E  T\k ))  (Given) 

2  Vi,  j,  k.  a'(i)  <  j  <  a'(i  +  1)  A  aV(i)  <  k  <  a"{i  +  1) 

((j  <  len(T'))  ^{k<  len{T '")))  A  (j  <  len(T )  =>  T'{j)  E  T"{k ))  (Given) 

3  a(i)  <  j  <  a(i  +  1)  (Assumption) 

4  aV(i)  <k<  a"(i  +  1)  (Assumption) 

5  3/3.  a^i)  <k'<  a'{i  +  1)  (a:'  is  strictly  increasing) 

6  c3(i)  <k'<  a'(i  +  1)  (3-elim) 

7  ((j  <  len(T))  {k'  <  len(T ')))  A  (j  <  len(T)  =>  T(j)  £  T'{k')) 

(line  1  with  lines  3  and  6) 

8  ((&'  <  len(T'))  ^  (k  <  len{T ")))  A  (A;'  <  len{T )  =>  T^/c')  £  T"(lfe)) 

(line  2  with  lines  6  and  4) 

9  ((j  <  len(T))  ^  (k  <  len(T ’"))) 

(First  conjuncts  of  lines  7  and  8  and  transitivity  of  -v^) 
10  j  <  len(T )  (Assumption) 
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11  T(j)  E  T'{k') 

12  k!  <  leniT') 

13  T'{k')  E  T"{k) 

14  T{j)  E  T"(k ) 

15  j  <len{T)  =>T(j)  ET"(k) 

16  (( j  <  len{T))  &  (k  <  leniT”)))  A 


(Line  7  second  conjunct  and  line  10) 
(Line  7  first  conjunct  and  line  10) 
(Line  8  second  conjunct  and  above) 
(Transitivity  of  E  and  lines  11  and  13) 
(^-introduction  lines  10  and  14) 
<  len{T)  =>  T(j)  E  T'fk)) 

(A-intro  lines  9  and  above) 


17  a(i)  <  j  <  a(i  +  1)  A  a"{i)  <  k  <  a'fi  4- 1)  =>■ 

i(j  <  leniT))  ^ik<  leniT”)))  A  (J  <  leniT)  =>  T(j)  E  T"ik )) 


(=^-intro:  3  and  4) 


□ 


Given  Lemma  7,  we  can  now  establish  that  is  an  equivalence  relation. 

Theorem  11.  is  an  equivalence  relation. 

Proof.  That  is  reflexive  and  symmetric  follows  immediately  from  Lemma  7  and  the 
definition  of  Transitivity  also  requires  Theorem  10.  We  have  T  ~eT'  and  T'  T" 
and  must  show  T  T" .  From  the  definition  of  applied  to  our  two  assumptions,  we 
have  matchesiT,  T' ,  a ,  /3,  E)  and  matches (T7 ,  T" ,  a' ,  /T,  E).  By  Theorem  10  we  can  con¬ 
vert  these  assumptions  to  matchesiT,  T' ,  BT,  BT',  E)  and  matchesiT' ,  T" ,  BT>,  BT",  E). 
By  Lemma  7  we  then  have  matchesiT,  T" ,  BT,  BTn,  E)  which  implies  T  T" .  □ 

Furthermore,  given  an  appropriate  equivalence  relation,  we  can  even  compose 
statements  involving  different  E s. 

Theorem  12.  Let  E”  be  an  equivalence  relation  satisfying  the  following. 

Va,  h,c.  (a  E  b  A  b  E'  c  =>■  a  E”  c) 

Then  T  and  T'  ~ &  T”  implies  T  ~  eh  T”. 
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Proof.  We  first  apply  the  definition  of  ~  (Definition  19)  to  obtain  matches(T ,  T',  a,  (3,  E ) 
and  matches{T’ ,  T",  a',  /V,  E')  for  some  a,  (3,  a\  f3' .  We  then  apply  Theorem  10  to  ob¬ 
tain  matches(T,  T\  BT ,  BT>,  E)  and  matches {T' ,  T" ,  BTt,  BTn,  E).  We  now  show  that 
matches{T,  T",  Bt,  Bt »,  E")  holds  and  thus  T  ~ E„  T" . 

Let  i,j,k  be  such  that  BT(i)  <  j  <  BT{i  +  1)  and  BTn(i )  <  k  <  BTifi  +  1). 
We  must  show  j  <  len(T)  <=>■  k  <  len(T '")  and  j  <  len(T )  implies  T (j )  E"  T"{k). 
From  matches(T,  T' ,  BT,  BT> ,  E)  we  have  that  T  (j )  E  T'(BTfi)).  From  our  assumption 
matches  (T  ’ ,  T" ,  BT> ,  Br»,  E")  we  have  T'(BTfi ))  E  T"(k ).  Combining  these,  we  have 
T (j )  E"  T(k),  which  is  one  of  our  goals. 

For  j  <  len(T)  <=>  k  <  len(T"),  we  note  that  matches (T,  T1 ,  BT,  BTr ,  E) 

implies  j  <  len(T)  Bxfi)  <  len(T')  and  matches  (T1  ,T" ,  Bt'  ,  Bt"  ,  E')  im¬ 

plies  BTi(i )  <  len(T’)  k  <  len(T ").  Combining  these,  we  have  our  goal  of 
j  <  len(T)  k  <  len(T").  □ 

3.2.1  Mapping  Between  Stuttering  Equivalent  Traces 

The  following  Lemma  will  be  very  useful  in  several  upcoming  proofs.  It  establishes  the 
existence  of  functions  that  map  between  related  positions  in  stuttering  equivalent  traces. 

Lemma  8.  If  T 

r^J  E  T'  then  there  exist  functions  f  :  N  — *  N  and  f  1  :  N  — >  N  such  that 

Vi.  Ti  andVi.  Tf-i ^  T-  and  f  and  /_1  are  monotonic  andVi.  /_1(/(i))  <  i. 

Proof.  Since  T  T'  we  have  that  there  are  strictly  increasing  functions  a,  (3  with  the 
properties  listed  in  Definition  1 8  and  reproduced  below. 

a ,  (3  strictly  increasing  (3.10) 

a(0)=/3(0)  =  0  (3.11) 

Vi,  j,  k.  a(i )  <  j  <  a(i  +  1)  A  (3{i)  <k<  f3(i  +  1)  =>■ 

(j  <  len(T )  ^k<  len{T’))  A  (j  <  len(T)  =>  (T(j))  E  ( T(k ))) 

We  first  define  f(i).  Since  i  e  N  we  have  i  >  0.  Because  a  is  strictly  increasing  and 
a(0)  =  0  and  i  >  0,  we  have  that  there  exists  a  c  such  that  a(c)  <  i  <  a(c  +  1).  Given 
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a(0)  a(l)  a(2)  a(3) 

II!  I 

III  I 

T  * - >• - >4 - >4 - >• - x - >i - x - ►  . . . 

r\j)  i 

m  >3(1)  >3(2)  >3(3) 

4 - >4 - X - X - >4 - X - X - >4 - >  •  •  • 

f(i)  3 

c  =  2  d  =  2 

Figure  3.3:  Example  depicting  the  sequences,  functions,  and  variables  involved  in  the  proof  of 
Lemma  8. 

this  c,  we  then  define  f(i)  as  follows. 

.  flen(T')  if  i>len(T) 

|/9(c)  if  i  <  len(T) 

Essentially,  by  discarding  the  first  i  elements  of  T,  we  have  changed  the  starting  point 
of  our  trace  and  thus  also  the  starting  point  for  the  matching  functions  a  and  (3.  The 
constant  c  is  the  index  for  a  that  brackets  i.  That  is,  a(c)  <  i  <  a(c  +  1).  We  use  this 
value  to  appropriately  adjust  the  starting  point  of  V .  Figure  3.3  gives  an  overview. 

We  first  present  the  proof  for  the  Ti  Tj(tj  conjunct  and  the  properties  of  /,  then  we 
give  the  proof  ofTf-i^  T[  and  the  properties  of  /_1. 

Ti  T'f^)  and  Properties  of  / 


We  first  handle  the  case  where  i  >  len(T).  In  this  case,  Tj  =  e  and  TL~  =  e  and 
e  e.  We  now  consider  the  case  where  i  <  len(T). 

We  need  to  produce  functions  a'  and  /T  satisfying  the  conditions  in  Definition  18. 
In  constructing  these,  we  are  allowed  to  use  the  a  and  /3  that  we  know  exist  due  to  the 
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assumption  T  V  (formulas  (3.10),  (3.11),  and  (3.12)).  The  functions  are  as  follows. 

ol(n)  =  max(a(n  +  c)  —  i,  0)  (3.13) 

/3'(n)  =  max(/3(;n  +  c)  —  /(*),  0)  (3.14) 

a/(0)  =  /3'(0)  =  0  We  first  show  that  a:' (0)  =  /3'(0)  =  0.  Wehavea'(O)  =  max(a(c)—i,  0). 
From  the  definition  of  c,  above,  we  have  a(c )  <  i.  This  implies  a(c )  —  i  <  0  which  implies 

max(a(c)  —  i,  0)  =  0.  For  /3'(0)  we  have  (3'( 0)  =  max (/3(c)  —  /(*),  0)  and  f(i)  =  /3(c), 
which  gives  us  /3f( 0)  =  max (/3(c)  —  /3(c),  0)  =  0. 


Strictly  Increasing  We  must  also  check  that  a'  and  (3'  are  strictly  increasing.  We  will 
first  consider  a'.  To  show  a'  is  strictly  increasing,  it  suffices  to  show  that  a^l)  >  0. 
This  is  due  to  the  max  operation  in  the  definition  of  a'  and  the  fact  that  a  is  strictly  in¬ 
creasing.  Given  the  definition  of  a'  (3.13),  we  have  that  if  a'(n)  >  0  for  some  n,  then 
a'(n)  =  a(n  +  c)  —  i.  Since  a  is  strictly  increasing,  we  have  a(n  +  c  +  1)  >  a(n  +  c) 
and  thus  a(n  +  c  +  1)  —  i  >  a(n  +  c  +  1)  —  1  and  finally  a'(n  +  1)  >  a!(n).  Thus, 
a'(n)  >  0  implies  a'  is  strictly  increasing  on  the  interval  [n,  oo).  As  we  have  already 
shown  <y(0)  =  0,  showing  a'(  1)  >  0  will  give  us  that  a'  is  strictly  increasing  on  the 
interval  [0,  oo),  as  desired. 

To  show  that  a'(l)  >  0,  note  that  a'(l)  =  max(a(  1  +  c)  —  i,0).  We  have  from 
our  choice  of  c  that  i  <  a(c  +  1).  This  implies  a(l  +  c)  —  i  >  0  which  implies 

max(a(  1  +  c)  —  i,  0)  >  0. 

The  case  for  (3 '  is  similar.  Since  (3  is  also  strictly  increasing  and  f3'  is  defined  using 
max  with  0,  the  same  reasoning  applies  and  to  show  [3'  is  strictly  increasing  it  suffices  to 
show  that  f3'(  1)  >  0.  We  have  /3'(1)  =  max(/3(  1  +  c)  —  f(i),  0).  The  definition  of  f(i)  is 
/3(c),  so  we  have  /3'(  1)  =  max((3(  1  +  c)  —  /3(c),  0).  Since  /3  is  strictly  increasing  we  have 
(3(1  +  c)  >  (3(c)  implying  that  f3'(  1)  >  0. 

End  of  Last  Blocks  Coincide  Let  f  and  k'  satisfy  a'(i')  <  j'  <  a'(i'  +  1)  and 
(3'(i')  <k'<  f3(i'  +  1).  We  must  show  that  (j'  <  len(Ti))  (k'  <  len(T'f ^)). 
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Expanding  the  definition  of  a'  and  ft  we  have 

aft  +  c)  —  i  <  j'  <  aft  +  1  +  c)  —  i 
/3  ft  +  c)  -  f(i)  <  k!  <  /3  ft  +  1  +  c)  -  f(i) 

Rewriting  by  moving  i  and  fft  to  the  inside  of  the  inequalities,  we  obtain 

aft  +  c)  <  j'  +  i  <  aft  +  1  +  c)  (3.15) 

/3ft  +  c)  <  k'  +  fft )  <  (3ft  +  1  +  c)  (3.16) 

Note  that  now  we  have  j ’  +  i  is  a  quantity  bounded  between  aft  +  c)  and  aft  +  c  +  1) 
(consecutive  values  of  a)  and  similarly  for  (3  in  the  second  formula.  By  (3.12)  we  then 
have  (j'  +  i  <  len(T ))  (k'  +  fft)  <  len(T')).  This  implies 

(ft  <  len(T)  -  i)  ^  (k'  <  len(T ’)  -  fft) 

Since  len(Ti )  =  len(T )  —  i  and  len(T’f(ft)  =  len(T’)  —  fft  this  gives  us 

(ft  <  len(Ti ))  ^  (k'  <  len(T'f{i) )) 


which  is  our  goal. 

E-related  To  show  that  ft  <  len(Ti)  =$■  (Tftj'))  E  (Tj^(k'))  we  first  assume 
j'  <  len(Tf)  and  apply  the  conclusion  above  (that  ft  <  lenftf)  k'  <  len(T’^ft) 
to  conclude  k'  <  leniTftyft  This  ensures  that  both  Tftft)  and  T'^ftk')  are  defined. 
Next,  we  note  that  Tftft)  =  T(i  +  ft)  and  T)(ftkr)  =  T'(fft)  +  k').  Thus,  it  suffices 
to  show  that  ( T(i  +  j'))  E  ( T'(f(i )  +  k')).  From  (3.15),  (3.16),  and  (3.12)  we  have 
(T(ft  +  i ))  E  ( T'(k '  +  fft))  which,  together  with  commutativity  of  +,  proves  our  goal. 

Monotonicity  of  /  Recall  that  for  i  >  len(T )  we  have  fft  =  len(T')  and  for 
i  <  len(T)  we  have  fft  =  (3(c)  for  the  c  such  that  aft)  <  i  <  aft  +  1).  We  now 
prove  that  such  an  /  is  monotonic.  Suppose  a  <  b.  We  will  show  that  fft)  <  m. 
There  are  three  cases.  If  a  >  len(T)  then  b  >  len(T)  and  fft)  =  fft)  =  len(T').  If 
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a  <  len(T )  and  b  >  len(T )  then  f(b)  =  len(T').  For  f(a),  we  first  choose  c  such  that 
a(c )  <  a  <  a(c  +  1).  By  (3.12)  and  a  <  len(T )  we  then  have  /3(c)  <  len(T').  Since 
f(a)  =  (3(c)  we  have  f(a)  <  len(T').  Thus  f(a)  <  f(b). 

Finally,  we  consider  a  <  len(T)  and  b  <  len(T).  We  first  choose  c  such  that 
a(c)  <  a  <  a(c  +  1)  and  d  such  that  a(d)  <  b  <  a(d  +  1).  Since  a  is  strictly  in¬ 
creasing,  this  can  always  be  done.  Since  a  <  b  and  a  is  strictly  increasing,  we  have  c  <  d. 
Now,  since  c  <  d  and  (3  is  strictly  increasing,  we  have  (3(c)  <  /3(d).  Since  f(a)  =  /3(c) 
and  f(b)  =  /3(d)  we  then  have  f(a)  <  f(b). 

Definition  of  f~l:  We  are  given  some  i  >  0.  We  first  let  d  be  the  number  such  that 
/3(d)  <  i  <  f3(d  +  1).  Since  (3  is  strictly  increasing,  such  a  d  always  exists.  We  then  define 
/-1(i)  as  follows. 

f_,  Uen(T)  if  i>len(V) 

|^a:(d)  if  i  <  len(T ') 

Tf-\({)  T[  and  Properties  of  /-1 

We  now  show  that  Vi.  Ty-i^i  T[.  Similar  to  before,  the  a'  and  (31  that  show  this  are 

a'(n )  =  max(a(n  +  d)  —  /-1(i),  0)  (3.17) 

(3'(n)  =  max(/3(n  +  d)  —  i,  0) 

For  i  >  len(T'),  we  have  T[  =  e  and  Tf-i^  =  Tien(T )  =  e.  Since  e  e,  we  have 
Tf-iu\  T[.  We  next  consider  the  case  where  i  <  len(T'),  considering  in  turn  each 
property  that  must  hold  of  a'  and  (3 

a'(0)  =  /5'(0)  =  0  We  have  a'(0)  =  max(a(d)  —  /-1(i),  0).  We  have  /_1(i)  =  a(d). 
Thus,  a'(0)  =  max(a(d)  —  a(d),  0)  =  0.  For  (3'(0),  we  have  /3'(0)  =  max((3(0+d)  —  i,  0). 
We  have  from  our  choice  of  d  that  /3(d)  <  i.  Thus,  /3(d)— i  <  0  and  max  (/3(d)  —i,  0)  =  0. 

Strictly  Increasing  As  before,  a'(  1)  >  0  will  be  sufficient  to  prove  a'  is  strictly  in¬ 
creasing  (given  the  assumption  that  a  is  strictly  increasing)  and  similarly  for  (3 ' .  We  have 
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a'(  1)  =  max(a(  1  +  d)  —  f  1(i),0)  =  max(a(  1  +  d)  —  a(d),0).  Since  a  is  strictly 
increasing,  we  have  a(l  +  rf)  —  a(d)  >  0  which  implies  a'(  1)  >  0. 

For  /T(l),  we  have  /?'(1)  =  max(/?(l  +  rf)  —  i,  0).  We  have  from  our  choice  of  d  that 
i  <  /3(d  +  1)  which  implies  f3(l  +  d)  —  i  >  0  and  thus  /T(l)  >  0. 

End  of  Last  Blocks  Coincide  Suppose  a'  (i')  <  j'  <  a'(i'+ 1)  and  <k'<  /3(i'+ 1). 
We  must  show  that  (f  <  len(Tf- qq))  (k'  <  len(T[)). 

Expanding  the  definition  of  a'  and  /?'  we  have 

a{i'  +  d)  —  /_1(i)  <  j'  <  a(i'  +  1  +  d)  —  /_1(i) 

/3(i'  +  d)  —  i  <k'  <  /3(i'  +  d  +  1)  —  i 

Rewriting  by  moving  i  and  /_1(i)  to  the  inside  of  the  inequalities,  we  obtain 

a(i'  +  d)  <  j'  +  /_1(i)  <  a(i'  +  1  +  d)  (3.18) 

f3(i'  +  d)<k'  +  i</3(i'  +  d+ 1)  (3.19) 

Note  that  now  we  have  j'  +  /_1(i)  is  a  quantity  bounded  between  a(i'  +  d)  and 
a(i'  +  d  +  1)  (consecutive  values  of  a)  and  similarly  for  /3  in  the  second  formula.  By 
(3.12)  we  then  have  ( j '  +  /_1(i)  <  len(T ))  (k'  +  i  <  len(T')).  This  implies 

( j '  <  len{T)  —  (k'  <  len{T')  —  i).  Since  len(Tf- qq)  =  len{T)  —  and 

len(T')  =  len(T')  —  i  this  gives  us  (f  <  len(Tf- qq))  (k'  <  len(T'))  which  is  our 

goal. 


E-related  To  show  that  j'  <  len(Tf- qq)  =>  (Ty-qqfq'))  E  (T'(k'))  we  first  assume 
that  j'  <  len(Tf-i(i))  and  apply  our  result  above  to  conclude  k'  <  len{T[).  This  ensures 
that  both  and  T'(k')  are  defined.  We  next  note  that  =  T(f~1(i)+j>) 

and  T[{k! )  =  T'(i  +  k').  Thus,  it  suffices  to  show  that  ( T(f~1(i )  +  /))  E  ( T'(i  +  k')). 
From  (3.18),  (3.19),  and  (3.12)  we  have  (T(f  +  /_1(i)))  E  ( T'(k '  +  i))  which,  together 
with  commutativity  of  +,  proves  our  goal. 
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Monotonicity  of  f~  1  Recall  that  for  i  >  len(T')  we  have  =  len(T )  and  for 

i  <  len(T')  we  have  /_1(/)  =  a(d)  for  the  d  such  that  /3(d)  <  i  <  (3(d  +  1).  We  now 
prove  that  such  an  f~  1  is  monotonic.  Suppose  a  <  b.  We  will  show  that  f-\a)  <  f~\b ). 
There  are  three  cases.  If  a  >  len(T ')  and  b  >  len(T')  then  /_1(a)  =  /_1(5)  =  len(T). 
If  a  <  len(T')  and  b  >  len(T')  then  /_1(5)  =  len(T).  For  f~1(a),  we  first  choose  the  d, 
such  that  /3(d)  <  a  <  (3(d  +  1).  By  (3.12)  and  a  <  len(T')  we  then  have  a(d)  <  len(T). 
Since  /_1(a)  =  a(d )  we  have  /_1(a)  <  len(T).  Thus  /_1(a)  <  f~l(b). 

Finally,  we  consider  a  <  len(T')  and  b  <  len(T').  To  compute  /  1(a)  and  /  L(b),  we 
first  choose  d\  such  that  f3(d\)  <  a  <  f3(d\  +  1)  and  d2  such  that  f3(d2 )  <  b  <  f3(d2  +  1). 
Since  f3  is  strictly  increasing  and  a  <  b  we  have  d\  <  d2.  Since  a  is  strictly  increasing, 
we  then  have  a(d\)  <  a(d2).  Since  /_1(a)  =  a(d\)  and  /_1(6)  =  a(d2)  we  then  have 
rVi)  <  /-'(da). 


Inverse  Relationship  We  now  show  that  /-1  (/(*))  <  T  Let  /  be  an  arbitrary  natural 
number.  If  i  >  len(T)  then /(i)  =  len(T')  and /_1(/en(T'))  =  len(T).  Since  i  >  len(T) 
we  have  /_1(/(i))  =  len(T)  <  i.  We  now  consider  the  case  where  i  <  len(T). 

In  this  case,  we  have  f(i)  =  /3(c)  for  some  c  such  that  a(c)  <  i  <  a(c  +  1)  and 
/_1(/(i))  =  1  (/5(c))  =  a(d)  for  some  d  such  that  /3(d)  <  /3(c)  <  f3(d  +  1).  Since  /3 
is  strictly  increasing,  /3(d)  <  /3(c)  <  f3(d  +  1)  implies  that  c  =  d.  We  can  then  use  this 
equality  to  derive  from  /-1(/(i))  =  a(d)  the  fact  that  f~l(f(i ))  =  a(c).  Since  we  have 
a(c)  <  i  we  then  have  /-1(/(i))  <  i  which  was  our  goal. 

□ 


3.2.2  Stuttering  Containment 

We  now  use  this  notion  of  stuttering  equivalence  to  define  stuttering  containment  for  sets 
and  define  stuttering  equivalence  of  trace  sets  as  mutual  containment. 
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Definition  21.  Let  T  and  T'  be  sets  of  traces.  Then  T'  E-stuttering  contains  T,  written 
T  <E  T',  iff  VT  G  T.  3 T'  gT'.T  V .  We  say  T  is  E-stuttering  equivalent  to  T', 

written  T  « E  T',  iffT  <E  T  and  T  <E  T. 

When  T  fce  T'  and  the  relation  E  is  clear  from  context  we  will  simply  say  that  T  and 
T'  are  stuttering  equivalent. 

We  can  now  obtain  a  version  of  Theorem  12  for  stuttering  containment. 

Theorem  13.  Let  E"  be  an  equivalence  relation  satisfying  the  following. 

Va,  h,  c.  (a  E  h  A  b  E'  c  =>•  a  E"  c ) 

Then  T  <E  T'  and  T'  <E>  T"  implies  T  fE"  T". 

Proof.  We  must  show  the  following. 

VT  G  T.  3T"  G  T".  T  ~E„  T" 

From  our  assumption  T  <E  Tr  we  have 

VT  G  T.  3T'  G  T'.  T  T' 

From  our  assumption  Tr  <E'  T"  we  have 

VT'  G  T'.  3T"  G  T".  T  ~E,  T" 

Combining  these  we  have 

VT  G  T.  3T'  G  T'.  T"  G  T".  T  T  A  T'  ~E'  T" 

We  can  then  apply  Theorem  12  to  obtain 

VT  G  T.  3 T  G  T',  T"  G  T".  T  ~E„  T” 

Eliminating  the  quantification  on  T'  then  gives  us  our  goal.  □ 


87 


3  Abstractions  and  Program  Properties 


3.2.3  Programs  and  Stuttering  Equivalence 

We  now  tie  these  general  notions  of  stuttering  equivalence  and  containment  to  programs 
and  give  some  examples  of  stuttering  equivalent  programs. 

The  trace  sets  of  interest  for  programs  are  those  obtained  when  executing  the  pro¬ 
gram  from  a  state  satisfying  some  precondition.  Thus,  for  some  programs  P  and  P'  and 
preconditions  Q  and  ()',  we  will  be  interested  in  questions  such  as  whether  the  relation 
traces([P  \  ()))  <E  traces((P'  \  O'))  holds  for  some  equivalence  relation  E.  Since  the  se¬ 
mantics  of  a  program  can  be  viewed  as  the  set  of  traces  produced  by  that  program,  this 
provides  a  connection  between  the  semantics  of  P  and  the  semantics  of  P'  (provided  each 
is  started  in  a  satisfactory  initial  state).  This  will  form  the  basis  of  our  notion  of  abstrac¬ 
tion. 

Definition  22.  A  program  P'  with  precondition  Q'  is  an  abstraction  of  a  program  P  with 
precondition  Q,  with  respect  to  an  equivalence  relation  E  iff  Q  and  O'  are  separation 
logic  formulae  and 

traces([P  \  Q ))  <E  traces((P'  \  Q ')) 

When  O.  O'  and  E  are  clear  from  context,  we  will  just  say  that  P'  is  an  abstraction  of  P. 

This  property  can  be  more  or  less  useful  depending  on  the  particular  preconditions 
involved  (and  also  depending  on  the  equivalence  relation  utilized).  For  example,  if  Q  is 
false,  then  we  can  establish  this  for  any  P,  P',  O' .  The  conciseness  of  the  term  abstraction 
is  useful  in  informal  discussions,  and  we  will  restrict  ourselves  to  using  it  in  such  settings. 
For  the  presentation  of  the  formal  development,  we  will  use  the  more  precise  notation 
developed  previously  (i.e.  <E,  etc.). 

The  strongest  correspondence  between  programs  P  and  P'  is  given  by  the  statement 
traces((P  |  true))  «=  traces([P'  |  true)),  where  =  is  the  identity  relation  on  execution  states. 
Since  our  execution  states  include  the  current  continuation,  this  will  only  hold  when 
P  =  P' ,  where  the  equality  is  up  to  reordering  of  labeled  continuations  (with  the  ini¬ 
tial  continuation  not  subject  to  reordering).  In  order  to  get  a  more  interesting  (and  weaker) 
correspondence,  we  move  to  the  following  notion  of  equality.  Let  =  be  the  least  relation 
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3.2  Stuttering  Equivalence 


satisfying  the  following. 


goto(/,  {s,h)) 

(k,  ( s,h )) 

final(s,  h ) 
error 


goto(/,  (s,  /?.)) 
(k',  ( s,h )) 

final(s,  h) 
error 


Note  that  =  identifies  exactly  those  states  that  are  the  same  modulo  the  current  continua¬ 
tion  k.  Now  we  can  describe  programs  that  involve  different  continuations,  but  which  pro¬ 
duce  stuttering  equivalent  sequences  of  store,  heap  pairs  (and  location,  store,  heap  triples  in 
the  case  of  goto  states).  Figure  3.4  lists  four  programs  that  are  stuttering  equivalent  in  the 
sense  that  for  any  P  and  P'  in  the  figure,  we  have  traces([P  |  true))  traces([P'  |  true)). 
In  each  case,  the  traces  of  Pj  consist  of  one  occurrence  of  the  state  goto(L0,  (s,  h))  fol¬ 
lowed  by  either  one  (as  in  Pi,  P2)  or  two  (as  in  P3,  P4)  occurrences  of  the  state  (k,  (s,  h )) 
for  some  k,  followed  by  one  (as  in  Pi,  P3,  Pi)  or  two  (as  in  P2)  occurrences  of  the  state 
(k,  (s[a  — y  0],  h)),  followed  by  the  traces  starting  from  goto(L!,  (s[a  — >  0],  h)).  Exam¬ 
ining  one  of  the  example  programs  in  detail,  we  see  that  traces  produced  by  P3  have  the 
following  form. 


goto(L0,  (s,h)) 
(branch  . .  .  end, (s,  h)) 
(a  :=  0 ;  goto  L1?  (s,h)) 
(goto  Li,  (s[a  -t  0 ],h)) 
goto(Li,  (s[a  -t  0 ],/i)) 
(halt,  (s[a  — >  0],  h)) 
final(s[a  — >  0],  h) 


It  is  also  instructive  to  consider  which  changes  violate  stuttering  equivalence.  The 
program  below,  while  quite  similar  to  P4,  is  not  stuttering  equivalent  from  precondition 
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p  = 

1  L0  :  a  :=  0;  goto  Li; 

Li  :  halt 
end 

P3  = 

Lo  :  branch  true  =>-  a  :=  0;  goto  Lj, 
true  =>•  a  :=  0;  goto  Lj; 
end 
Li  :  halt 
end 


L0  :  a  :=  0;a  :=  0;  goto  Lx; 

Li  :  halt 
end 

p4  == 

Lo  :  branch  x  >  0  =>-  a  :=  0;  goto  Li, 
x  <  0  =>•  a  :=  0;  goto  Li 

end; 

Ll  :  halt 
end 


Figure  3.4:  Four  examples  of  stuttering  equivalent  programs.  Each  example  involves  a  different 
continuation  at  Lq. 


true. 


p/  def 

■*4  — 

L0  :  branch  x  >  0  =>■  a  :=  0;  goto  Li, 
x  <  0  =>■  a  :=  0;  goto  Lx 

end; 

Li  :  halt 
end 


The  reason  this  program  is  not  stuttering  equivalent  to  the  programs  in  Figure  3.4  is  that, 
due  to  the  lack  of  a  branch  for  x  =  0  in  the  continuation  at  L0,  P'A  does  not  contain  traces 
in  which  s(x)  =  0  (where  s  is  the  store  associated  with  some  state  in  the  trace).  However, 
/  4  is  stuttering  equivalent  to  the  other  programs  when  evaluated  from  the  precondition 
x  /  0.  This  is  an  example  of  the  importance  of  the  initial  conditions  (as  represented  by 
the  precondition).  By  removing  certain  sets  of  traces  from  consideration,  the  precondition 
can  cause  programs  that  do  not  correspond  in  general  to  be  stuttering  equivalent. 


90 


3.2  Stuttering  Equivalence 


There  are,  however,  programs  which  cannot  be  made  stuttering  equivalent  according 
to  =  regardless  of  the  precondition.  Consider  the  program  below. 

pi  <y_r 

L0  :  a  :=  0;  b  :=  0;  b  :=  1;  goto  Li; 

Li  :  halt 
end 

This  program  is  similar  to  P\  except  that  it  mentions  an  additional  variable  b.  The  traces 
of  P[  contain  states  where  s(b)  =  0  and  states  where  s(b)  =  1.  The  value  of  b  in  any  trace 
of  Pi  will  always  be  constant,  preventing  these  two  programs  to  from  being  related  by 
for  any  precondition  other  than  false. 

However,  these  programs  are  stuttering  equivalent  if  we  change  the  equivalence  rela¬ 
tion  on  execution  states  to  one  that  does  not  take  into  account  the  value  of  b.  Consider  the 
equivalence  relation  given  below,  which  is  the  —y  relation  on  stores  (Definition  1)  lifted 
to  execution  states. 

Definition  23.  —v  is  the  least  relation  satisfying  the  following. 


goto(/,  (s,h)) 

—v 

goto (/,  {s',h)) 

iff  s  =v 

( k,(s,h )) 

—v 

(k',  (s',  h)) 

iff  s  =v 

final(s,  h ) 

—v 

finals',  h) 

iff  s  —v 

error 

=v 

error 

With  this  relation,  we  can  now  specify  the  correspondence  between  P\  and  P[.  We 
have  traces((Pi  |  true))  ^  traces((P[  |  true)). 

Heap-Manipulating  Examples  New  commands  can  also  be  added  to  heap-manipulating 
programs  while  preserving  this  version  of  stuttering  equivalence.  Figure  3.5  gives  some 
examples  of  relationships  between  programs  that  involve  the  heap.  P5  gives  a  program 
that  frees  a  linked  list  at  x  with  length  a.  As  it  frees  elements,  it  keeps  track  of  the  length 
of  the  remaining  portion  of  the  list  by  updating  a. 
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■to  ~ 

L0  :  goto  Li; 

Li  :  branch  x  f  nil  => 
t  :=  x; 
x  :=  x.next; 
free  t  ; 
a  :=  a  —  1; 
goto  Li, 
x  =  nil  =>  halt 
end 

P7  = 

L0  :  goto  Li; 

Li  :  branch  a>0=t- 

a  :=  a  —  1; 
goto  Li, 
a  =  0  =>■  halt 
end 


P& 


def 


P8  = 


L0  :  goto  Li; 

Li  :  branch  a  >  0  =>- 
t  :=  x; 
x  :=  x.next; 
free  t; 
a  :=  a  —  1; 
goto  Li, 
a  =  0  =>  halt 
end 


L0  :  a  :=  ?;  goto  Li; 

Li  :  branch  a>0=t- 

a  :=  a  —  1; 
goto  Li, 
a  =  0  =>•  halt 
end 


traces  ((Pi  \  ls(  a,x,  nil))) 
traces((P$  \  ls( a,x,  nil))) 
traces((P7  \  ls( a,x,  nil))) 


s={x,t,a>  tmces((P6  \  ls( a,  x,  nil))) 
traces((P7  \  ls( a,  x,  nil))) 

— {a} 

<={a}  traces((P8  \  3a.  ls(a,x ,  nil))) 


Figure  3.5:  Increasingly  weaker  abstractions  of  Pi. 


When  started  from  the  precondition  Zs(a,x,  nil)  this  program  is  safe ,  in  the  sense  that 
no  traces  from  this  precondition  end  with  error.  This  corresponds  to  the  LTSL  property 

~(F  (err)). 

The  program  also  has  the  property  that  for  every  state  of  the  form  goto(Li,  (s,  h)),  we 
have  (s,  h )  \=x  ls( a,x,  nil).  Put  another  way,  ls( a,x,  nil)  is  an  invariant  of  location  Lp 
This  corresponds  to  the  LTSL  property  G(atloc(L1)  =>  ls( a,x,  nil)). 

Finally,  the  program  always  terminates,  meaning  that  its  trace  set  contains  no  infinite 
traces.  The  LTSL  formula  corresponding  to  termination  is  F (final). 
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Program  P6  is  stuttering  equivalent  to  P5  in  the  sense  that  they  satisfy 

traces([P5  \  ls( a,x,  nil)))  traces([P6  \  ls( a,x,  nil))) 

That  is,  when  started  in  a  state  satisfying  /s(a,x,  nil),  their  traces  consist  of  the  same  se¬ 
quence  of  memory  states  with  the  only  difference  being  possible  repetition  of  some  states. 
In  this  case,  there  is  not  even  any  repetition.  The  only  difference  between  the  two  pro¬ 
grams  is  that  P5  branches  on  x  f  nil,  whereas  P6  branches  on  a  >  0.  Since  a  is  always 
equal  to  the  length  of  the  list  at  x,  these  conditions  are  equivalent  and  result  in  the  same 
set  of  traces. 

Program  P7  consists  solely  of  the  commands  involving  a.  Such  a  program  is  not  stut¬ 
tering  equivalent  to  P5  or  P6  given  any  of  the  equality  relations  on  execution  states  that 
have  been  discussed  so  far.  However,  it  is  stuttering  equivalent  given  the  relation  below. 

Definition  24.  =y  is  the  least  relation  on  execution  states  that  satisfies  the  following. 


goto  =v  goto  (l,(s',h')) 

iff  s  =v  s' 

( k,(s,h ))  =v  (k',(s',h')) 

iff  s  =v  s' 

final(s,  h )  =y  finals',  h') 

iff  s  =v  s' 

error  =y  error 

The  =v  relation  is  the  same  as  —y  except  that  the  heaps  are  not  required  to  be  the 
same.  We  can  now  state  the  relationship  between  P6  and  P7.  It  is 

traces((P6  I  ls( a,x,  nil)))  traces((P7  I  ls( a,x,  nil))) 

— M 

and  the  same  relation  holds  between  P5  and  P7. 

The  program  P8  is  an  example  of  a  program  that  is  not  stuttering  equivalent  to  any  of 
the  previous  programs,  but  does  stuttering  contain  the  traces  of  some  of  them.  We  have 
the  following. 


traces([P7  \  ls( a,x,  nil)))  <={a}  traces([P$  \  ls( a,x,  nil))) 
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The  program  P8  contains  traces  stuttering  equivalent  to  the  traces  in  P7,  but  also  contains 
traces  where  the  non-deterministic  assignment  causes  a  to  have  a  value  other  than  the 
length  of  the  list. 

The  non-deterministic  assignment  can  also  be  used  to  ensure  that  we  consider  execu¬ 
tions  where  a  is  the  length  of  the  list  even  when  such  a  situation  is  not  guaranteed  by  the 
precondition.  For  example,  the  following  relationship  holds. 

traces((Pr  |  ls( a,x,  nil)))  <={a}  traces([P$  |  3a.  ls( a,x,  nil))) 

Note  that  we  are  abstracting  a  program  that  assumes  a  is  the  length  of  the  list  by  a  program 
that  only  assumes  there  exists  some  length — the  requirement  that  some  program  variable 
is  storing  the  length  is  dropped  in  the  precondition  of  P8. 

This  use  of  non-determinism  is  an  important  component  of  the  numeric  abstraction 
technique  that  is  the  subject  of  Chapters  4  and  5. 


3.3  Stuttering  Equivalence  and  LTSL  Properties 

We  now  present  some  theorems  relating  stuttering  equivalence  and  containment  and  satis¬ 
faction  of  LTSL  properties. 

Definition  25.  A  state  formula  <,  is  E-invariant  for  an  equivalence  relation  E  iff 

V7,  y.  7  e  y  =>  ((7  hv  y  o  (V  hv  y ) 

An  LTSL  formula  0  is  E-invariant  iff  all  state  formulae  in  0  are  E -invariant.  The  set  of 
E-invariant  LTSL  formulae  is  denoted  LTSLE. 

In  the  case  of  the  a  path  formula  containing  the  state  formula  Q,  this  definition  above 
does  not  require  that  sub-formulas  of  Q  be  P-invariant.  However,  all  examples  of  P- 
invariant  state  formulae  that  we  will  present  in  this  thesis  are  composed  of  P-invariant 
sub-formulas. 

Formulae  that  are  P-invariant  are  preserved  by  P-stuttering  equivalence. 
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Theorem  14.  If  o  G  LTSLE  and  T  ~^  T'  then  T  ftx  <f>  if  and  only  ifV  \=x  ft 

We  first  state  an  easy  lemma,  which  follows  directly  from  the  definition  of  LTSl/  . 

Lemma  9.  Iff)  G  LTSLE ,  then  for  all  path  formulae  ft  such  that  ft  is  a  sub-formula  of  ft 
we  have  ft  G  LTSLE . 

Proof  By  the  definition  of  LTSl/  (Definition  25)  we  have  that  all  state  formulae  in  0  are 
E- invariant.  Since  ft  is  a  sub-formula  of  0,  the  set  of  state  formulae  appearing  in  ft  is  a 
subset  of  those  appearing  in  ft  Thus,  all  the  state  formulae  in  ft  are  ^-invariant  and  so 
ft  G  LTSLe.  □ 

We  now  turn  to  the  proof  of  the  theorem  above  (Theorem  14). 

Proof  The  proof  is  by  induction  on  the  structure  of  f.  We  only  consider  the  core  connec¬ 
tives  A,  and  U  as  the  other  connectives  are  definable  in  terms  of  these  (Theorem  9).  We 
start  with  the  base  case,  in  which  f  =  q  for  some  state  formula  <7 

CASE  f  =  q:  We  first  consider  the  forward  direction  of  the  “if  and  only  if.”  Suppose 
T  \—x  q.  From  the  semantics  in  Figure  3.2  we  have  that  len(T )  >  0  and  T(0)  \=x  <7 
From  our  assumption  that  T  T'  we  have  matches(T,  Tf  a ,  ft  E )  and,  by  the  definition 
of  matches,  this  gives  us  0  <  len(T)  AA  0  <  len{T')  and  T( 0)  E  T'(0).  Since  we 
have  len(T )  >  0  this  gives  us  len{T')  >  0.  From  our  assumption  that  <p  G  LTSL^  and 
Definition  25  we  then  have 

7  E  ft  ^  ((7  \=x  ft  AA  (7'  ftx  <?)) 

Applying  this  to  T(0)  E  T'{ 0)  we  obtain  T(0)  ftx  S  AA  T'(0)  |=.y  As  T(0)  \=x  q  is  one 
of  our  assumptions,  we  then  have  T'(0)  \=x  S,  which,  combined  with  len(T')  >  0  gives 
us  T'  ft  x  <7  The  backward  direction  is  the  same,  except  that  T  and  T'  are  exchanged. 

CASE  f  =  0\  A  02~  We  first  consider  the  forward  direction  of  the  “if  and  only  if.”  We 
assume  T  ftx  ft  A  <f> 2.  By  the  semantics  of  A  we  then  have  T  ftx  ft  and  T  ftx  ft.  By 
Lemma  9  and  f  G  LTSL/  we  have  ft  G  LTSLE  and  ft  G  LTSLE.  This  allows  us  to  apply 
the  inductive  hypothesis  to  each  of  these  formulae  yielding  T'  ftx  ft  and  T'  ftx  ft. 
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Again  applying  the  semantics  of  A  we  obtain  V  |=A  0i  A  02  which  is  our  goal.  The 
reverse  implication  is  identical,  but  with  T  and  T'  exchanged. 

CASE  0  =  ~0] :  We  first  consider  the  forward  implication  and  assume  T  \=x  ~0 i- 
The  semantics  of  ~  then  give  us  that  T  \Tx  0 i-  The  inductive  hypothesis  then  gives  us 
T'  \j=x  0i  (since  the  conclusion  of  the  theorem  is  an  “if  and  only  if”).  From  this,  we  apply 
the  semantics  of  ~  to  obtain  our  goal:  T'  \=x  ~0i.  The  reverse  implication  is  the  same, 
but  with  T  and  T'  exchanged. 

CASE  0  =  0i  U  02*  As  before,  Lemma  9  tells  us  that  0i  G  LTSL£  and  02  G  LTSL£, 
which  is  one  condition  needed  to  apply  the  inductive  hypothesis. 

The  following  derivation  establishes  the  forward  direction  of  the  implication.  We  start 
from  the  assumption  that  T  \=x  0i  U  02,  which  tells  that  there  is  some  i  satisfying  the  two 
initial  assumptions  below. 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 


0  <  i  <  len(T)  A  (7)  |=x  02) 
Vj.  0  <  j  <  i  =>  (Tj  \=x  0i) 


T 

Ti  T' 


~e  r 
E 


Tf(i)  \~x  ^ 

o  <j'<  f(i ) 
j'  <  f(i) 


(Given) 

(Given) 

(Given) 

(Lemma  8  (for  the  /  defined  in  that  lemma)) 
(Inductive  Hypothesis:  line  1  conjunct  2  and  line  4) 

(Assumption) 
(Lemma  8  (f^1  defined  in  the  Lemma)) 

(6) 


r\f)  <  rvn)) 

f-\f)  <  i 

\=x  0i 

t;,  \=x  0i 

vy.  0  <  f  <  f(i )  =>  (Tj,  \=x  0i) 

T[  \=x  02  A  Vj'.  0  <  j'  <i=>  (Ty  (=x  0i) 


(Lemma  8,  monotonicity  of  /-1) 
(Lemma  8) 
(9  and  10) 
(2  and  11) 
(Inductive  Hyp:  7  and  12) 
(V-intro,  =A-intro:  6  and  13) 
(3-intro  ( f(i )  — >■  i ):  5  and  14) 
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16  V  \=x  0i  U  02  (Semantics  of  U) 

As  before,  since  is  symmetric,  the  proof  of  the  backward  implication  is  the  same 
as  for  the  forward  direction,  but  with  T  and  V  exchanged.  □ 

A  corollary  of  Theorem  14  is  that  stuttering  containment  preserves  satisfaction  of 
LTS10  properties  in  one  direction. 

Corollary  1.  If  o  G  LTSLE  and  S,  S'  are  transition  systems  and  traces(S)  <E  traces  (S') 
then  S'  \=x  0  implies  S  \=x  0. 

Proof  This  follows  from  the  fact  that  LTSL  formulae  are  interpreted  universally  over  trace 
sets.  Suppose  S'  \=x  0-  By  Definition  17  this  implies 

VT'  G  traces  (S').  T  hv  0  (3-20) 

That  traces(S)  <E  traces(S')  implies  the  following. 

VT  G  traces(S).  3 V  G  traces(S').  T  ~E  T  (3.21) 

We  now  show  VT  G  traces(S).  T  \—x  0,  which  implies  S  \=x  0  by  Definition  17.  Sup¬ 
pose  T  G  traces(S).  By  (3.21)  we  have  3T'  G  traces(S').  T  ~E  T'.  Then  by  Theorem  14 
and  (3.20)  we  have  T  \=x  0,  which  is  our  goal.  □ 

These  results  are  not  new.  Analogous  theorems  are  presented  in  [Clarke  et  al.,  1999] 
and  [Clarke  and  Schlingloff,  2001].  Here  we  have  adapted  these  results  to  our  particular 
formal  setup,  with  separation  logic  formulae  as  the  state  formulae  for  the  temporal  logic 
and  transitions  systems  arising  from  programs  in  our  source  language. 

3.3.1  Syntactic  Descriptions  of  E-invariance 

The  theorems  above  are  stated  in  terms  of  E- invariant  LTSL  formulae,  and  the  definition 
of  T-invariance  (Definition  25)  is  given  in  terms  of  the  satisfaction  relation  |= x  for  LTSL 
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formulae.  However,  we  can  also  give  syntactic  restrictions  that  enforce  /^-invariance  for 
the  equality  relations  —v  and  =y.  These  syntactic  restrictions  are  much  easier  to  check 
than  the  semantic  properties  used  in  Definition  25. 

Syntactic  Description  of  =y-invariance 

Definition  26.  Let  LTSL(V )  be  the  set  of  LTSL  formulae  with  free  variables  contained  in 
the  set  V. 

Theorem  15.  Iff  e  LTSL(V)  then  o  is  —y -invariant. 

Proof.  We  must  show  that  if  the  free  variables  of  f  are  contained  in  V,  then  all  state 
formulae  c,  which  are  sub  terms  of  c>  have  the  following  property. 

V7,y.  (7  =V  7)  =7  ((7  \=X  ?)  ^  (V  |=.Y  ?)) 

We  first  note  that  if  the  free  variables  of  f  are  contained  in  V  and  c  is  a  subterm  of  0,  then 
the  free  variables  of  c  are  contained  in  V.  We  now  consider  an  arbitrary  7, 7'  such  that 
7  —y  7'  and  show  that  (7  \=x  c)  -77  (7'  \=x  c).  The  proof  is  by  case  analysis  on  the  state 
formula  <7 

CASE  c  =  err.  That  7  \=x  err  holds  implies  7  =  error.  The  relation  7  —y  7'  then 
implies  7'  =  error  which  implies  7'  \=x  err.  The  reverse  direction  is  identical  with  7 
and  7'  exchanged. 

CASE  c  =  final:  That  7  \=x  final  holds  implies  7  =  final(s,  h )  for  some  s,  h.  The 
relation  7  —y  7'  then  implies  7'  =  final  (.s',  h )  where  s  =v  s'.  This  implies  7'  \=x  final. 
The  reverse  direction  is  the  same  with  7  and  7'  exchanged. 

CASE  c  =  atloc(l):  That  7  |=^  atloc(l)  holds  implies  7  =  goto(/,  (s,  h)).  The  relation 
7  —y  7'  then  implies  7'  =  goto(/,  (s',  h))  with  s  —v  s'.  This  implies  7'  \=x  atloc(l). 
The  reverse  direction  is  the  same  with  7  and  7'  exchanged. 

CASE  c  =  Q.  That  7  |=.y  Q  holds  implies  7  =  (k,  (s,  h))  or  7  =  goto(/,  (s,  h))  or 
7  =  final(s,  h)  and  in  each  case  (s,  h)  \=x  Q ■  We  will  consider  the  7  =  (k,  (s,h)) 
case.  The  others  are  similar.  We  have  7  —y  7'  which  implies  that  7'  =  (k',  (s',  h))  where 
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s  —v  s'.  By  Lemma  4  and  the  fact  that  fv(Q )  C  V  we  then  have  (s',  h)  \=x  Q ■  This 
implies  (. k ',  (s',  h))  \=x  Q  according  to  the  semantics  given  in  Figure  3.2.  □ 

Next,  we  have  a  similar  result  for  =y. 

Syntactic  Description  of  =y-invariance 

Definition  27.  Let  LTSLP(V)  be  the  set  of  pure  LTSL  formulae  with  free  variables  in  V. 
These  are  LTSL(V)  formulae  that  do  not  contain  subterms  that  are  in  the  grammar  for 
spatial  predicates  given  in  Figure  2.6.  That  is,  they  do  not  contain  subterms  of  the  form 
emp,  ea  (->•  [p],  or pT{eT). 

Theorem  16.  Iff  E  LTSLP(V )  then  o  is  3k  v -invariant. 

Proof.  The  proof  is  similar  to  the  proof  for  —v  above.  We  must  show  that  if 
f  E  LTSLP(L)  then  for  all  state  formulae  q  which  are  sub-formulae  of  0,  we  have 

V7, 7'.  (7  =v  If')  =>■  (7  l=x  <r)  (7'  hx  <r)  (3.22) 

The  formula  q  must  have  the  form  final,  err ,  atloc(l),  or  ().  The  first  three  cases  are  iden¬ 
tical  to  the  corresponding  cases  in  the  proof  of  Theorem  15  above.  For  q  =  Q,  we  have 
that  Q  is  pure  since  Q  is  a  sub-formula  of  0  and  0  E  LTSLP(L).  Given  the  semantics 
of  7  |=x  ?  in  the  case  where  q  —  Q,  showing  condition  (3.22)  reduces  to  showing  the 
following. 

if  Q  is  pure  then  (,s  =y  s')  =>  V/i,  h' .  ((s,  h)  \=x  Q)  ((s7,  h!)  \=x  Q) 

We  show  this  by  induction  on  Q,  recalling  that  since  Q  is  pure,  the  base  cases  Q  =  emp, 
Q  =  ea  (->•  [p]  and  Q  =  pT(eT)  need  not  be  considered. 

CASE  Q  =  eb:  In  this  case,  the  semantics  of  0  is  independent  of  the  heap.  The  definition 
of  (=  \-  from  Figure  2.7  tells  us  that  (s,  h )  |=x  Q  iff  [eb]  s  =  true.  By  Lemma  1  we  have 
that  [eb]  s  =  [eb]  s,  which  implies  (s,  h)  \=x  Q  iff  (s',  hi)  \=x  Q. 

CASE  Q  =  Qi  *  Q2 :  We  have  (s,h)  |=x  Q 1  *  Q2  iff  there  exist  h\,h2  such  that 
dom(hi)  D  dom(h2 )  =  0  and  h  =  hi  D  h2  and  (s,  hf)  \=x  Q\  and  ( s,h2 )  \=x  Q 2. 


99 


3  Abstractions  and  Program  Properties 


That  fv(Q)  C  F  implies  fv(Q i)  C  F  and  fv(Q 2)  C  1/.  This  allows  us  to  apply  the 
induction  hypothesis. 

But  we  must  first  determine  how  to  split  the  heap.  We  wish  to  show  (s',  h!)  \=x  Qi*Q2 
for  an  arbitrary  h! .  To  do  this,  we  must  show  that  there  exists  h\ ,  h'2  such  that 
domihi)  D  dom(h'2 )  =  0  and  h'  =  h\  U  h'2  and  (s',h[)  \=x  Qi  and  ( s',h2 )  \=x  Qn- 
We  let  h\  =  h!  and  let  h’2  =  {}.  Clearly  dorn(h\)  D  dom(h2)  =  0  and  h!  =  h\  U  h2.  Our 
inductive  hypothesis  tells  us  that  since  (s,  h )  \=x  Q i,  we  can  conclude  (s',  h\)  \=x  Q i 
and  similarly  for  Q2.  This  completes  the  proof. 

CASE  Q  =  Qi  A  Q2:  We  have  (s,  h)  \=x  Qi  A  Q2  iff  (s,  h)  \=x  Q 1  and  (s,  A)  |=A-  Q2. 
Again,  fv(Q)  C  1/  implies  fv(Qi )  C  V  and  fv(Q2 )  C  F,  allowing  us  to  apply  the 
inductive  hypothesis  to  (s,  A)  [=A  Qi,  obtaining  (s',  //')  |=A  Qi  for  an  arbitrary  h!  (and 
similarly  for  (s',  /i')  \=x  Q?).  This  implies  our  result. 

CASE  Q  =  Qi  V  Q2:  This  case  is  very  similar  to  the  *  and  A  cases.  We  have 
(s,  h)  \=x  Qi  V  Q-2  iff  (s,  h)  \=x  Q 1  or  (s,  h)  \=x  Qi-  In  either  case,  we  have  fv(Qi )  C  V 
and  apply  our  inductive  hypothesis  to  obtain  (s,  h)  \=x  Qi  iff  ( s' ,  h')  \=x  Qi  for  an  arbi¬ 
trary  h',  which  lets  us  conclude  that  (s,  h)  \=x  Q  iff  ( s' ,  h)  \—x  Q- 

CASE  Q  =  (Q|  Q2):  We  will  consider  the  forward  direction  first  and  show  that 

for  all  h!  we  have  (s,h)  \=x  (Q 1  Q2)  implies  (s',  /i')  |=A  (Qi  ^  Q2).  Sup¬ 

pose  (s,h)  \=x  (Q 1  =>  Q2).  Then  by  the  definition  of  \=x  given  in  Figure  2.7  we 
have  (s,h)  \=x  Q 1  implies  (s,  h)  \=x  Q-2-  Now,  suppose  (s',  /i')  (=x  Qi-  Since 
fv(Q)  —  /u(Qi)U/u(Q2)  and/v(Q)  C  F,  we  have/u(Qi)  C  F  and/7(Q2)  C  F.  This  lets 
us  apply  our  inductive  hypothesis,  obtaining  (s,  h)  \=x  Q i-  This  implies  (s,  h)  \=x  Q2  by 
our  assumption,  which,  applying  the  inductive  hypothesis  again,  gives  us  (s',  h')  \=x  Q2- 
Thus,  we  have  shown  that  (s',  h')  \=x  Q 1  implies  (s',  h')  \=x  Q 2,  which  lets  us  conclude 
(s',  h')  \=x  (Q 1  =>■  Q2).  The  proof  of  the  backwards  direction  is  symmetric,  with  s  and  s' 
interchanged. 

CASE  Q  =  3x.  Q':  We  consider  the  forward  direction  first.  The  relation  (s,  h )  |=x  Q 
implies  there  exists  a  v  such  that  (s[a:  — y  v],h )  |=x  Q'.  Consider  the  store  s'  [a;  — >  u]. 
Since  s  =y  s',  we  have  s[x  — >  v]  —vu{x}  s' [x  — >  v\.  We  have  that  fv(Q)  =  fv(Q')  —  {x} 
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L0  :  goto  Li; 

Li  :  branch  x  /  nil  =t- 
t  :=  x; 
x  :=  x.next; 
free  t ; 
goto  Li, 
x  =  nil  =>  halt 
end 


L0  :  goto  Li; 

Li  :  branch  a  >  0  =>- 
t  :=  x; 
x  :=  x.next; 
free  t ; 
a  :=  a  —  1; 
goto  Li, 
a  =  0  =>■  halt 
end 


Figure  3.6:  Two  programs  with  traces  related  by 


and  fv(Q)  C  V  which  implies  fv(Q')  C  V  U  {a;}.  We  can  then  apply  our  inductive 
hypothesis  to  (s[x  — >  v],h)  \=x  Q',  obtaining  (s' [a;  — >  v],h!)  \—x  Q'  for  an  arbitrary 
h' .  This  implies  (s',  h!)  \=x  O' .  The  backward  direction  is  symmetric,  with  s  and  s' 

interchanged. 

CASE  Q  =  Mx.  Q:  We  consider  the  forward  direction  first.  Let  h!  be  an  arbitrary  heap. 
The  relation  (s,  h)  \—x  Va;.  0  implies  that  for  all  v  we  have  (s[a;  — >•  v\,h)  \—x  Q' ■ 
Consider  an  arbitrary  v' .  Instantiating  v  above  with  v'  we  have  (s[x  — >  v1],  h)  \=x  Q' ■ 
Since  s  =y  s',  we  have  s[x  — >  v]  =vu{x}  — >  v].  We  have  that  fv(Q)  =  fv(Q')  —  {x} 

and  fv(Q)  C  V  which  implies  fv(Q')  C  V  U  {a:}.  We  can  then  apply  our  inductive 
hypothesis  to  (s[x  — >  v'],h)  \=x  Q',  obtaining  (s'[x  — >  v'],h')  \—x  Q'.  Since  v'  was 
arbitrary,  we  conclude  that  for  all  v'  we  have  (s' [a;  — >  v'],h')  \=x  O',  which  implies 
(s',  h ')  |=  v  Va;.  ()' .  The  backward  direction  is  symmetric,  with  s  and  s'  interchanged.  □ 


3.3.2  Translating  Results  Obtained  By  Analyzing  Abstractions 

Corollary  1  stated  the  connection  between  C-stuttcring  trace  containment  and  C-invariant 
LTL\X  properties.  Given  programs  P  and  P'  and  preconditions  Q  and  O'  such  that 
tmces((P  |  ()))  <E  tmces((P'  \  O')),  this  allows  us  to  take  a  property  0,  which  we  would 
like  to  check  for  ((P  |  ()))  and  instead  check  that  it  holds  of  ((P'  |  O' )) .  For  example,  in 
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Figure  3.6  we  give  two  programs  satisfying  the  following. 

traces([P  \  ls( a,x,  nil)))  ~={xt}  traces([P'  \  ls( a,x,  nil))) 

Suppose  we  want  to  show  that  P  terminates.  Termination  corresponds  to  the  LTSL  prop¬ 
erty  F (final).  We  can  check  that  this  property  holds  of  P7,  which  it  does  since  variable 
a  decreases  during  each  iteration  and  is  bounded  below  by  0.  This  then  implies  that  P 
satisfies  F (final)  and  thus  P  also  terminates. 

This  approach,  of  stating  a  property  of  the  original  program  and  then  proving  it  holds 
of  the  abstraction,  naturally  leads  one  to  consider  properties  stated  over  the  free  variables 
of  the  original  program.  However,  it  can  also  be  useful  to  consider  properties  involving  the 
variables  that  occur  in  the  abstraction,  but  not  in  the  original  program  (a  is  an  example  of 
such  a  variable  in  P').  We  could  ask  a  static  analysis  to  analyze  P7  and  return  an  invariant 
that  holds  at  Li.  Such  an  invariant  may  involve  variables  in  P1  that  are  not  in  P  and  thus  the 
property  may  not  hold  of  P.  For  example,  the  property  G(atloc(Li)  A  ls( a,x,  nil))  holds 
of  ((P7 1  Is  (a,  x,  nil))).  However,  since  the  variable  a  is  not  updated  by  P,  this  property  does 
not  hold  of  P,  even  when  started  from  the  same  set  of  initial  states. 

We  can,  however,  translate  the  property  that  holds  of  P7  to  a  property  that  holds  of 
P  by  accounting  for  the  fact  that  the  variable  a  is  not  updated  by  P.  By  existentially 
quantifying  a,  we  capture  the  fact  that  there  is  a  value  of  a  that  makes  the  property  true, 
without  requiring  a  to  actually  be  updated  with  the  appropriate  value.  The  property  that 
holds  of  P  then  becomes  G(atloc  (Li)  A  3a.  Is  (a,  x,  nil)). 

This  mode  of  reasoning  is  captured  by  the  following  theorem,  which  allows  us  to 
relate  properties  of  P7  to  properties  of  P  even  when  P7  includes  variables  not  present  in 
P.  First  we  define  a  function  [¥](V,  0)  which  existentially  quantifies  the  variables  in  V  in 
all  state  formulae.  We  write  3V.  Q  where  V  is  a  finite  set  of  variables  to  represent  the 
existential  quantification  of  all  variables  in  V  (that  is,  (3V.  Q )  =  (3tq,  v2,  ■  ■ , ,  vn.  Q )  if 
V  =  {v1,v2,  •  •  •  ,vn}). 

Definition  28.  Let  V  be  a  finite  set  of  variables.  Then  0)  and  [~v~|(  V,  0)  are  defined 
via  mutual  induction  as  given  in  Figure  3. 7. 
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0(v,O 


3V.Q  if  s  =  Q 
<  for  some  Q 

S  otherwise 


0(v,o 


VV.Q  if  q  =  Q 
<  for  some  Q 

S  otherwise 


0(V,  01  A  o2 )  =  (U](V,  0i))  A  @(V,  02 )) 
0(v,  01 V  0a)  =  (H](V,  0i))  V  (U](v;  02)) 
0(^-0)  =  ~®V,0)) 

0(V,  G0)  =  G(H](V,0)) 

g(y,F0)  =  F(g(y,0)) 

0(v,  0i  u  0a)  =  (|J](v,  0i))  u  (\b\{v,  0)) 


0(V,  01  A  02)  =  (0(V,  0i))  A  (0(V,  02)) 
0(V,  01  v  02)  =  (0(V,  01))  V  (0(V,  02)) 
gfv,~0)  =  ~(U](V,0)) 

0(V,G0)  =  G(0(F,0)) 

0  V,  F0)  =  F(  v  (V'.  o)) 

0(V,  01  u  02)  =  (0(V,  0i))  u  (0(y,  0)) 


Figure  3.7:  Definition  of  3  and  v  . 


Theorem  17.  Suppose  T  ~=v.  T'  and  let  V'  =  fv{(f)  —  V.  Then  T'  \=x  0  implies 
T  \=x  0)  and  T'  0  implies  T  \fx  0C',  0). 

Corollary  2.  Let  V'  =  /u(0)  —  V.  If  traces ((P  \  Q))  <=v  traces ((P' \  Q'))  and 

IP '  I  Q'))  l=x  0  ((P  |  Q))  |=y  0(v;,  0). 

To  the  best  of  our  knowledge,  this  theorem  has  not  been  stated  before,  perhaps  because 
most  of  the  work  on  LTL\X  makes  minimal  assumptions  about  the  language  of  state  for¬ 
mulae;  in  particular,  existential  and  universal  quantification  are  not  assumed  to  be  present. 

Before  we  proceed  with  the  proof,  we  first  establish  the  following  lemma. 

Lemma  10.  If  len(T")  >  0  and  T  ~=v  T'  then  len(T )  >  0  and  T(0)  —v  T'{ 0). 

Proof  The  conditions  len(T )  >  0  and  len(T')  >  0  are  required  for  T(0)  and  T'(0)  to  be 
defined.  The  proof  proceeds  as  follows. 

1  T  ~=v  V  (Given) 

2  len(T')  >  0  (Given) 

3  3a,  f3.  matches  (T,T'  ,  a,  /3,—v)  (Def.  of  (Def.  19)) 

4  matches (T,Tf ,  a, (3,—v)  (3-elim) 
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HG(F,  4>i  v  fa)  = 

]5l(y,  ~(~0i  a  ~<fe))  = 

~(|  V  |(V)  ~</*i  A  2))  = 
4v|4.oi)a;v(k.o2))  = 
(-i^^A^^))) 
(0C^i))  v  (H](v;^)) 


[E(v;f^)  = 

[|](y,  true  U  0)  = 
(g(F,true))U(g(F^)) 
true  U  ([|](V,  </>))  = 

F(H](V,*)) 


[E(v;g^)  = 
#-FW)))  = 

~(f  &v,~m  = 

~(F(~([|](V,0))))  = 
G©^*)) 


Figure  3.8:  Derivations  showing  that  our  definition  of  [I]  is  consistent  with  the  rewritings  given 
in  Theorem  9.  The  corresponding  derivations  for  [v]  arc  identical,  with  the  symbols  [I]  and  |~vj 
interchanged. 


5  a(0)  =  /3 (0)  =  0  (Def.  of  matches  (Def.  18)) 

6  a(0)  <  0  <  a(l)  A  /3(0)  <  0  <  /3(1)  (Above  and  a ,  f3  strictly  increasing) 

7  Vi,  j,  k.  (ct(i)  <  j  <  a(i  +  1))  A  <  k  <  (3(i  +  1))  => 

( len(T )  >  0  len{T')  >  0)  A  (T (j )  =v  T'{k ))  (Def.  of  matches ) 

8  ( len(T )  >  0  len{T')  >  0)  A  (T( 0)  —v  T'(0))  (=^-elim:  above  two  lines) 


□ 


We  now  present  the  proof  of  Theorem  17.  We  will  only  consider  the  core  connectives 
A,  and  U.  To  justify  this  simplification,  we  must  show  that  Definition  28  is  consistent 
with  the  encoding  of  V,  F,  and  G  in  terms  of  these  core  connectives.  This  is  demon¬ 
strated  by  the  derivations  in  Figure  3.8,  where  we  first  translate  a  formula  into  its  core 
representation  as  given  by  Theorem  9,  then  apply  the  definition  of  [J],  then  rewrite  the 
result  according  to  Theorem  9.  The  formula  we  obtain  in  the  end  should  be  the  same  as 
that  given  by  Definition  28.  The  corresponding  derivations  for  [v]  are  identical,  with  the 
symbols  JT  and  [v]  interchanged. 


104 


3.3  Stuttering  Equivalence  and  LTSL  Properties 


Proof,  (of  Theorem  17)  The  proof  is  by  induction  on  the  formula  o.  We  have  the  following 
assumptions. 


(3.23) 

(3.24) 

And  we  wish  to  show 

7  ' o  implies  7' -A  ((V".©) 

and 

r  y=x  0  implies  T^x  0(V",0) 


t  ~=v  r 
V  —  fv{4>)  -  V 


Base  Cases 

We  now  consider  the  |j]  conjunct  for  the  first  three  base  cases,  which  are  as  follows. 

0  =  atloc(l) 

0  =  err 
f  =  final 

These  are  all  proved  in  the  same  way.  We  present  derivations  for  each  base  case,  but 
they  all  have  the  same  structure.  The  final  base  case,  <fr  =  Q,  is  presented  last  and  the 
structure  of  the  proof  is  different  in  that  case. 

CASE  0  =  atloc(l): 

1  T'  \=x  atloc(l)  (Given) 

2  len(T')  >  0  A  (T'( 0)  \=x  atloc(l )) 

(Def.  of  (=x  relation  for  path  formulae  (Figure  3.2)) 

3  3s,  h.  T'( 0)  =  goto(/,  (s,  h)) 

(Def.  of  \=x  relation  for  state  formulae  (Figure  3.2)) 

4  T'(0)  =  goto(/,  (s,  h))  (3-elim) 
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5  len(T)  >  0  A  (T( 0)  =v  T'( 0)) 

(Lemma  10:  assumption  (3.23)  and  line  2  conjunct  1) 

6  T(0)  —v  goto(Z,  (s,  h))  (Above  and  line  4) 

7  3s'.  T(0)  =  goto(/,  (s',  h))  A  s  =v  s'  (Def.  of  =v  (Def.  23)) 

8  T(0)  \=x  atloc(l)  (Def.  of  |=x  (for  state  formulae)) 

9  T  \=x  atloc(l)  (Def.  of  |=.y  (for  path  formulae):  above  and  line  5  conjunct  1) 

10  T  hx  atlocil ))  (Def.  of  g  (Def.  28)) 


CASE  0  =  err. 


1 

2 

3 

4 

5 

6 

7 

8 
9 


V  \=x  err  (Given) 

len{T')  >  0  A  (T'(0)  |=.y  err )  (Def.  of  |=x  relation  (Figure  3.2)) 

T'(0)  =  error  (Def.  of  \=x  relation  (Figure  3.2)) 

len(T)  >  0  A  (T( 0)  =v  T'{ 0)) 


T(0)  =y  error 
T(0)  =  error 
T( 0)  [=x  err 
T  \=x  err 
T-x  :(V".err) 


(Lemma  10:  assumption  (3.23)  and  line  2  conjunct  1) 

(Above  and  line  3) 
(Def.  of  =v  (Def.  23)) 
(Def.  of  |=.y  (for  state  formulae)) 
(Def.  of  j=x  (for  path  formulae):  above  and  line  4  conjunct  1) 

(Def.  of  [J]  (Def.  28)) 


CASE  0  =  final: 

1  T'  \=x  final 

2  /en(T')  >  0  A  T'(0)  [ =x  final 

3  3s,  h.  T'( 0)  =  final(s,  h ) 

4  T'(0)  =  final(s,  h) 

5  len(T)  >  0  A  (T( 0)  =v  T'(O)) 


(Given) 

(Def.  of  |=.y  relation  (Figure  3.2)) 
(Def.  of  |=_y  relation  (Figure  3.2)) 

(3-elim) 
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(Lemma  10:  assumption  (3.23)  and  line  4  conjunct  1) 

6  T( 0)  —v  final(s,  h)  (Above  and  line  5) 

7  3s'.  T( 0)  =  flnal(s',  h)  A  s  =y  s'  (Def.  of  —y  (Def.  23)) 

8  T(0)  \=x  final  (Def.  of  \=x  (for  state  formulae)) 

9  T  \=x  final  (Def.  of  \=x  (for  path  formulae):  above  and  line  7  conjunct  1) 

10  T  \=x  jj| V',  final)  (Def.  of  g  (Def.  28)) 

CASE  (j)  —  Q:  We  have  that  V  |=A-  0  and  want  to  show  that  T  \=x  YY'-  Q)-  The 
definition  of  \=x  states  that  our  assumption  V  \=x  Q  implies  len(T')  >  0  A  T'(0)  \=x  Q. 
We  also  have  the  assumption  T  ~=v,  T'  which,  by  Lemma  10,  implies  len{T)  >  0  and 
T(0)  —y  T'( 0).  We  have  by  the  definition  of  [¥]  (Definition  28)  that  [i](V'/,  Q)  =  3V7.  Q 
and  from  the  definition  of  |=A  we  have  that  T  \=x  3V'.  0  iff  len(T )  >  0  and 
T( 0)  |=.v  3V'.  Q.  Thus,  our  goal  reduces  to  showing  that  T( 0)  \=x  3V'.  Q  based  on 
the  assumptions  T'( 0)  \=x  Q  and  T(0)  =y  T'( 0). 

We  now  case  split  on  the  form  of  T'(0).  Based  on  the  semantics  of  LTSL  in  Figure  3.2 
and  T'(0)  \=x  Q  we  have  that  T'(0)  either  has  the  form  (k,  (s,  h)},  or  goto(/,  (s,  h)),  or 
final(s,  h)  and  that  whichever  case  holds,  we  have  (s,  h)  \=x  Q ■  All  the  cases  are  proved 
in  the  same  way,  so  we  will  only  show  (k,  (s,  h))  here. 

We  have  from  T( 0)  =y  T'( 0)  and  T'(0)  =  (. k,(s,h ))  that  T( 0)  =  (. k',(s',h ))  for 
some  s'  such  that  s'  —y  s.  We  want  to  show  (s',  h )  |= x  3V'.  Q,  which  will  hold  if  we 
can  give  some  s"  that  differs  from  s'  only  on  the  values  of  variables  in  V'  and  for  which 
(s",  h)  \=x  Q  holds.  The  needed  s"  is  defined  as  follows. 

f  s'(x)  ifx&V' 
s  (x)  =  < 

^s(x)  if  x  e  V' 

Clearly  this  s"  differs  from  s'  only  in  the  values  of  variables  in  V'.  We  will  show  that 
(s",  h)  \=x  Q  by  applying  Lemma  4  to  our  assumption  that  (s,  h)  \=x  Q ■  In  order  to 
apply  this  lemma,  we  must  show  that  s  =jv(Q)  s" ■  To  do  this,  we  consider  an  arbitrary 
variable  x  and  show  that  if  x  G  fv(Q )  then  s(x)  =  s"(x).  From  V'  =  fv(Q )  —  V  and 
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x  e  fv(Q),  we  have  that  either  x  e  V'  or  x  e  V.  If  x  e  V'  then  we  have  s"(x)  =  s(x) 
(our  goal)  from  the  definition  of  s".  If  x  e  V  then  we  have  from  s  =y  s'  that  s(x)  =  s'(x). 
Then,  from  the  definition  of  s"  we  have  that  either  s"(x)  =  s(x)  (in  which  case  we  have 
attained  our  goal)  or  s"(x)  =  s'(x),  in  which  case  transitivity  of  equality  with  s(x)  =  s'(x ) 
gives  us  s(x)  =  s"(x).  Thus,  we  have  s  =jv(Q)  s"  and  can  apply  Lemma  4  obtaining  our 
goal  of  (s",  h )  \=x  Q  and  completing  the  proof  of  this  case. 

We  now  show  the  base  cases  for  the  [v]  conjunct.  They  are  similar  to  the  [¥]  cases  except 
that  since  our  assumption  involves  the  \=x  relation  not  holding,  there  is  some  disjunction 
involved.  In  particular,  a  trace  can  fail  to  satisfy  a  state  formula  either  by  being  empty  or  by 
being  non-empty  with  a  first  state  that  is  not  of  the  appropriate  form.  This  is  demonstrated 
by  the  following  derivation. 


1  r  \Ax  s  (Given) 

2  ->(len(T')  >  0  A  (T'( 0)  |=x  e))  (Def.  of  \=x) 

3  -i(/en(T/)  >  0)  V  (T'(0)  S')  (Boolean  Reasoning) 

The  empty  cases  are  all  handled  uniformly.  We  show  the  derivation  for  these  below. 

1  ->(len(T')  >  0)  (Given) 

2  3a,  f3.  matches(T,  T' ,  a,  f3,  —v)  (Assumption  (3.23)  and  Def.  of  (Def.  19)) 

3  matches (T,Tr ,  a, /3,—v)  (3-elim) 

4  a(0)  =  (3(0)  =  0  (Def.  of  matches  (Def.  18)) 

5  a(0)  <  0  <  a(l)  A  /3(0)  <  0  <  (3{  1)  (Above  and  a,  f3  strictly  increasing) 

6  Vi,  j,  k.  (a(i)  <  j  <  a{i  +  1))  A  (f3(i)  <  k  <  (3{i  +  1)) 

j  <  len(T)  k  <  len(T')  (Def.  of  matches ) 

7  (a(0)  <  0  <  a(l))  A  (/3(0)  <  0  <  /3(1))  =► 

0  <  len(T)  0  <  len{T') 

0  <  len(T )  0  <  len{T') 


8 
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(V-elim,  i,  j,  k  —  0) 
(=A-elim:  above  and  line  5) 
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9  -.( len(T )  >  0) 

(Above  and  line  1) 

10  Tfa V0(7'O 

(Def  of  \=x  for  path  formulae) 

This  leaves  us  with  the  task  of  showing  that  T'(0)  Y -x  S  implies  T( 0)  Y=x  [vjtV',  q) 
under  the  assumption  that  len(T')  >  0  and  len{T)  >  0.  As  before,  Lemma  10  gives  us 
that  T(0)  —v  T'(Q).  We  consider  each  base  case,  starting  with  =  err. 

CASE  =  err: 

1  len(T)  >  0 

(Given) 

2  len(T')  >  0 

(Given) 

3  T{ 0)  =v  T'{ 0) 

(Given) 

4  T'(0)  err 

(Given) 

5  T'(0)  7^  error 

(Def.  of  hv) 

6  (T'(0)  =  final(s,/i))  V  (T'(0) 

=  goto(/,  (s,  h)))  V  (T'( 0)  =  (k,  (s,  h))) 

(Case  analysis) 

At  this  point,  the  reasoning  is  the  same  for  each  disjunct.  We  show  T'( 0)  =  final(s,  h) 
as  an  example. 

7  T'(0)  =  fmal(s,  h)  (Given) 

8  T(0)  —v  final(s,/i) 

(Above  and  line  3) 

9  T(0)  =  finals',  h)  A  s'  =y  s 

(Def.  of  =v  (Def.  23)) 

10  T(0)  error 

(Def.  of  =  (syntactic  equality)) 

11  T(0)  err 

(Def.  of  =x  for  state  formulae) 

12  T(0)  /-x  .  -  XV'.err) 

(Def.  of  0) 

CASE  c  =  final: 

1  len(T )  >  0 

(Given) 

2  len(T')  >  0 

(Given) 
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3 

T(0)  =v  T'(0) 

(Given) 

4 

T'(0)  \£x  final 

(Given) 

5 

Vs,  h.  T'( 0)  7^  final(s,  /i) 

(Def.  of  |=Y) 

We  now  begin  a  proof  by  contradiction  aimed  at  showing  that  T( 0)  ^  final(s,  h )  for 

all 

S,  ll. 

6 

T(0)  =  final(s',  /i') 

(Assumption) 

7 

finals',  h')  =v  T( 0) 

(Above  and  line  3) 

8 

T'(0)  =  final(s",  A')  A  s"  =y  s' 

(Def.  of  =v) 

9 

T'(0)  7^  final (s",  h1) 

(V-elim,  line  5) 

10 

false 

(Previous  two  lines) 

11 

(T(0)  =  final(s',  /i'))  =>•  false 

(=^-intro  line  5  and  above) 

12 

T(0)  7^  finals',  /V) 

(Boolean  reasoning) 

13 

Vs',  b! .  T(0)  7^  flnal(s',  /i') 

(V-intro) 

14 

T(0)  Y=x  final 

(Def.  of  ^=x  for  state  formulae) 

15 

T( 0)  ^  .final) 

(Def.  of  [V]) 

CASE  q  =  atloc(l): 

1 

len(T)  >  0 

(Given) 

2 

len(T’)  >  0 

(Given) 

3 

T(0)  =v  T'(0) 

(Given) 

4 

T'(0)  atloc(l) 

(Given) 

5 

Vs,  /i.  T'(0)  7^  goto(/,  (s,  /i)) 

(Def.  of  \=x) 

We  now  begin  a  proof  by  contradiction  aimed  at  showing  that  T( 0)  ^  final(s,  h )  for 

all 

s,  h 

6 

T( 0)  =  goto(/,  (s',/*')) 

(Assumption) 

7 

finals', /V)  =y  T'(O) 

(Above  and  line  3) 
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8  T'( 0)  =  goto(/,  (s",  h!))  A  s"  —v  s' 

9  T'(0)  ±  goto(/,  (s",ti)) 

10  false 

11  (T(0)  =  goto((,  (s',  h')))  =>•  false 

12  T(0)  7^  goto(/,  (s',  h')) 

13  Vs',  h!.  T(0)  7^  goto(Z,  (s',  h')) 

14  T( 0)  Kv  atloc(l) 

15  T(0)  /-x  f](r,  atloc(l)) 


(Def.  of  =v) 
(V-elim,  line  5) 
(Previous  two  lines) 
(=»-intro  line  5  and  above) 
(Boolean  reasoning) 
(V-intro) 

(Def.  of  \—x  for  state  formulae) 
(Def.  of  0) 


CASE  q  —  Q:  This  case  is  structured  as  a  proof  by  contradiction.  We  have  T(0)  =y  T'( 0) 
and  T'( 0)  j£x  Q •  We  will  show  that  from  T(0)  \=x  Q)  we  can  derive  a  contradic¬ 

tion,  leading  us  to  conclude  that  our  goal  formula  T( 0)  ft=x  [V](IA',  Q)  must  hold. 

Since  [V (V',  Q)  =  VIA'.  Q ,  the  assumption  T(0)  |=x  \v\(V',Q)  implies  that 
T( 0)  \=x  VIA'.  Q.  We  now  case  split  on  the  form  of  T(0),  which  must  be  either  final(s,  h), 
goto(/,  (s,  h)),  or  (k,  (s,  h)).  As  these  are  all  handled  the  same  way  (only  the  s,  h  portion 
is  important),  we  will  only  consider  (k,  (s,  h))  here. 

From  T( 0)  —v  T'( 0)  and  T(0)  =  ( k ,  (s,  h))  we  have  T'(0)  =  (k',  (s',  h))  such  that 
s'  —v  s.  The  assumption  T(0)  j=x  VIA'.  Q  implies  that  (s,  h)  \=x  VIA'.  Q  which  implies 
that  for  all  s"  such  that  s"  and  s  differ  only  in  the  values  assigned  to  variables  in  V,  we 
have  (s",  h)  \=x  Q.  In  particular,  we  will  consider  the  s"  given  below. 

fs(ar)  if  x  £  1A' 
s  (x)  =  < 

^s'(x)  ifxeV' 

We  will  now  derive  a  contradiction  from  (s",  /i)  |=x  Q  and  T'(0)  [V.v  <5  and  s'  =y  s.  We 
start  by  proving  s"  —fv(Q)  s'.  Suppose  x  e  fv(Q).  Then  since  V'  =  fv(Q)  —  V  we  have 
either  x  €  V'  or  x  €  1A.  If  x  £  \A'  then  by  the  definition  of  s"  we  have  s"(x)  =  s'(x) 
which  is  our  goal.  If  x;  <G  V  then  we  can  establish  s"(x)  =  s'(x)  regardless  of  which  case 
of  the  s"  definition  we  are  in.  If  s"(x)  =  s(x),  then  by  s'  =y  s  we  have  s'(x)  =  s(x) 
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and  thus  s"(x)  =  s'(x).  If  s"(x)  =  s'(x )  then  this  is  already  our  goal  formula  and  we  are 
done. 

Now  that  we  have  shown  s"  —mq)  s'  we  can  apply  Lemma  4  to  our  assumption  of 
(s'1,  h )  \=x  Q  to  obtain  (s',  h )  |=.Y  Q ■  Recall  that  T'( 0)  =  (kr,  (s',  h)).  The  definition  of 
\=x  then  gives  us  that  T'(0)  \=x  0.  But  this  contradicts  the  assumption  T'(0)  Y^x  Q- 


Inductive  Cases 


We  now  consider  the  connectives  that  operate  on  path  formulae.  These  constitute  the 
inductive  cases.  We  consider  only  the  core  connectives,  as  justified  by  the  derivations  in 
Figure  3.8  and  Theorem  9. 


CASE  3  [~0] 

CASE  3.1  [[J]  conjunct] 

1  T  Ha- 

2  V  Ky  0 

3  T^xM V'A) 

4  T|=y~(0(1/',0)) 

5  T  —x  gv^o) 


(Assumption) 
(Semantics  of  ~  (Figure  3.2)) 
(Inductive  Hypothesis) 
(Semantics  of  ~  (Figure  3.2)) 
(Def.  of  [¥]  (Def.  28)) 


CASE  3.2  [\V]  conjunct]  This  case  is  the  dual  of  the  above  case. 


1  T'  Kv  ~0 

2  T'  \=x  0 

3  T|=Yg(H',0) 

4  T  \f=x  ~(^(H',0)) 

5  T  hv0(n~0) 


(Assumption) 
(Semantics  of  ~  (Figure  3.2)) 
(Inductive  Hypothesis) 
(Semantics  of  ~  (Figure  3.2)) 
(Def.  of  0  (Def.  28)) 


CASE  4  [0i  A  02] 
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CASE  4.1  [[¥]  conjunct] 

1  T'  \=x  0i  A  02 

2  T  |=A-  01  and  V  \=x  02 

3  T  \=x  ®(V",  0i)  and  T  |=x  02) 

4  2>x  0(^01)  A  0(^02) 

5  Thx^(n0iA02) 


(Assumption) 
(Semantics  of  A  (Figure  3.2)) 
(Inductive  Hypothesis) 
(Semantics  of  A  (Figure  3.2)) 
(Def.  of®  (Def.  28)) 


CASE  4.2  [0  conjunct] 

1  T'  V=x  0i  A  02  (Assumption) 

2  T'  y=x  0i  or  T'  y=x  02  (Semantics  of  V  (Figure  3.2)) 

Without  loss  of  generality,  we  assume  that  the  T'  0Y  0!  case  holds.  The  other  case  is 
identical. 

3  V  y^x  0i  (Given) 

4  f^x[v](0,0 1)  (Inductive  Hypothesis) 

5  T  0(10  0i )  A  0(10  02 )  (Semantics  of  A  (Figure  3.2)) 

6  T  \y=x  MV,  0i  A  02)  (Def.  of  0  (Def.  28)) 


CASE  5  [0i  U  02 ] 

CASE  5.1  [®  conjunct] 

1  T'  \=x  0i  U  02  (Assumption) 

2  Eli.  0  <  i  <  len(T')  A  (0  |=.Y  02)  A  (Vj.  0  <  j  <  i  =>  T'  \=x  0i) 

(Semantics  of  U  (Figure  3.2)) 

3  (0  <  i  <  len{T '))  A  ( T[  \=x  02)  A  (Vj.  0  <  j  <  i  =>  V-  \=x  0i) 

(3-elim) 

We  first  establish  that  there  is  a  7).  such  that  20  |=x  000  02) 

4  V  ~=v.  T  (Assumption  (3.23)  and  Theorem  11) 
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5  T[  ~=v  Tf{i) 

6  (/(*)  <  len(T ))  AA  (i  <  len(T ')) 


(Lemma  8) 


7  /(i)  <  len{T) 

8  0  <  /(*) 

9  0  <  /(*)  <  len{T) 

10  T-  \=x  02 


(Def.  of  ~=v.  (Def.  19)  and  Def.  of  matches  (Def.  18)) 
(Above  and  line  3  first  conjunct) 
(/  has  type  N  — >■  N) 
(Above  two  lines) 
(3  second  conjunct) 
(Induction  Hypothesis:  5  and  10) 


1 1  Tf(i)  \=X  7T (V7,  02 )  (Induction  Hypothesis: 

12  (0  <  f(i)  <  len{T))  A  (Tf(p  \=x  [J](l//,  02))  (A-intro,  above  and  line  9) 

We  next  show  that  for  all  j  such  that  0  <  j  <  fit)  we  have  Tj  \=x  1  J  {V\  0i) 


13  0 

14  riu)<f-im) 

15  f-\f(i))<i 

16  0  <f~\j) 

17  0<f-\j)<i 

18  Tf~1U)  ^ 

19  Tj  ~=y 


(Assumption) 
(Lemma  8,  mono  tonicity  of  /_1) 
(Lemma  8,  composition  of  /  and  Z”1) 
(Z^1  has  type  N  — >  N) 
(Previous  three  lines) 
(line  3  last  conjunct  and  17) 
(Lemma  8) 
(Induction  Hypothesis:  18,  19) 


20  Tj  \=X\A\{V' ,(j) i)  (Induction  Hypothesis:  18,  19) 

21  0  <  j  <  f(i )  =>•  Tj  \=x  [g](V'/,  0i)  (Imp.  Intro.:  lines  13  and  20) 

22  Vj.  0  <  j  <  f(i )  =>■  Tj  \=x  [i](V"/,  0i)  (V-introduction) 

23  (3a;.  0  <  a;  <  len(T)  A  Tx  \=x  f$V",  <h)  A  (Vy.  0  <  j  <  x  =>  Tj  \=x  00) 

(3-intro  with  x  —  f(i):  lines  12  and  22) 

24  T  \=x  (l](r ,  0i))  U  ([|](y',  02))  (Semantics  of  U  (Figure  3.2)) 

25  T  \=x  [E(r,  0i  U  02)  (Def.  of f]  (Def.  28)) 


CASE  5.2  [[v]  Case] 
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1  T'  V^x  0i  U  02  (Assumption) 

2  Wk.  k  >  len(T')  V  Tj  \/=x  02  V  (3j.  0<j<kAT'^x  0i) 

(Semantics  of  U  (Figure  3.2)) 


Let  p  be  an  arbitrary  natural  number. 


3  T 


ftp) 


Tp 


(Lemma  8  and  assumption  (3.23)) 

4  f{p)  >  leniT')  V  T'f(p)  \AX  02  v  (3 j.  0  <j<  f(p)  A  Tj  ^x  0i) 

(line  2  with  k  —  f(p)) 

Case  1:  f (p)  >  len{T') 

5  (f(p)  <  len(T'))  &  (p  <  len(T )) 

(Def.  of  ~=v  (Def.  19)  and  Def.  of  matches  (Def.  18)  and  line  3) 

6  p  >  len(T)  (Line  5  and  this  case  assumption) 

Case  2:  T'f(p)  ^ x  02 

7  Tp  y=x  v  |(\/ 3  02 )  (Inductive  Hypothesis:  line  3  and  this  case  assumption) 

Case  3:  3  j.  0  <j<  f(p)  A  Tj  y=x  0i 

8 

9 

10 
11 
12 

13 

14 


0  <  j  <  fip)  A  Tj  y=x  0i 

r\j)  <  r\f{p )) 

Tf-Hi)  ~=v  Tj 
/_1(/(p))  <  p 
0  <  f~\j)  <  p 
Tf-Hj)  ^0(V',0i) 

3m.  0  <  m  <  p  A  Tm  ^x  0(^',  0i) 


(3-elim) 

(Lemma  8,  monotonicity  of  Z”1) 
(Lemma  8  and  assumption  3.23) 
(Lemma  8,  composition  of  /  and  /_1) 
(lines  1 1  and  9  and  /-1  has  type  N  — >■  N) 
(Inductive  hypothesis:  line  8  conjunct  2  and  line  10) 


(3-intro  with  m  —  f  1  (j ) :  lines  12  and  13) 


We  now  combine  the  results  from  Cases  1,  2,  and  3  to  obtain  the  following  disjunction. 

15  p  >  len(T )  V  Tp  y=x  0(C',  02)  V  3m.  0  <  m  <  p  A  Tm  0(C',  0i) 

(V -intro:  lines  7  and  14) 
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16  Vp.  p  >  len(T )  V  Tp  0Y  00',  02)  V  3 m.  0  <  m  <  p  A  Tm  0Y  00,  0i) 

(V-intro) 

17  T  00,  0i)  U  J 00',  02)  (Semantics  of  U  (Figure  3.2)) 

18  T  \j=x  0(1/',  0!  U  02)  (Def.  of  0  (Def.  28)) 

□ 


We  also  have  that  the  set  of  quantified  variables  can  always  be  extended. 

Lemma  11. 

1.  IfT  \=x  00  0)  and  V'DV  then  T  \=x  00',  0). 

2.  IfT  00  0)  and  V'DV  then  T  |0Y  00',  0). 

Proof.  The  proof  is  by  induction  on  the  structure  of  the  formula  0.  The  inductive  cases 
all  follow  directly  from  the  inductive  hypothesis,  the  definitions  of  0  and  0,  and  the  se¬ 
mantics  of  LTSL  operators.  We  give  the  example  of  0  =  ~0'.  Suppose  T  \=x  00,  ~0'). 
Then  by  the  definition  of  0  we  have  T  \=x  ~(00,  0')).  This  implies  T  0Y  00  0'). 
Applying  the  inductive  hypothesis,  we  have  T  0(1 /',(/>').  Applying  the  semantics 

of  |=  Y  and  the  definition  of  0  to  this  formula  gives  us  T  \=x  ~(00/,  07))  and  then 
T  \=x  00',  ~0').  This  completes  the  proof  of  this  case. 

The  proof  for  0  is  dual  (0  and  0  are  interchanged,  as  are  \=x  and  \/=x).  We  start 
from  T  y=x  00,  ~0')  and  derive  T  \fx  ~(0(V,0'))  and  then  T  \=x  0(10  0').  The 
inductive  hypothesis  gives  us  T  \=x  0(1 00').  Applying  the  semantics  of  ~  gives  us 
T  \f=x  ~(00',  0')).  Applying  the  definition  of  0  gives  T  y=x  0(13',  ~0'). 

The  base  cases  err,  final,  and  atloc(l)  are  all  straightforward  since  if  0  is  one  of  these 
formulae,  we  have  00,  0)  =  00,  0)  =  0  for  all  sets  of  variables  V. 

The  only  interesting  case  is  0  =  Q.  In  this  case,  the  0  conjunct  follows  from  the 
fact  that,  31/.  Q  31/'.  0  if  V  D  V.  Formally,  we  have  T  \=x  00  0).  Apply¬ 
ing  the  definition  of  |=.Y  gives  us  that  T( 0)  =  (k,  (s,  h ))  or  T( 0)  =  goto(/,  (s,  h))  or 
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T( 0)  =  final(s,  h).  In  all  these  cases  we  have  (s,  h)  \=x  [¥](!/,  Q),  which  is  equivalent 
to  (s,  h)  \—x  3  V.  Q.  At  this  point,  we  reason  that  for  any  V'  such  that  V'  D  V,  we  have 
(s,  h )  \=x  3K  Q  implies  (s,  /i)  |=A-  31/'.  Q.  Re-applying  the  definitions  of  [¥]  and  |=x 
we  then  derive  T(0)  |=A  [^(V7,  Q)  and  finally  T  \=x  .  0). 

The  [y]  case  is  similar  except  that  we  make  use  of  the  fact  that  if  V'  D  V  then 

(s,  h )  \/=x  VV.  Q  implies  (s,  h)  \f=x  VV'.  Q-  □ 

3.3.3  Example 

Consider  the  example  below,  which  iterates  through  a  linked  list. 

p  =f 

L0  :  goto  Li 

Lx  :  branch  x  ^  nil 

x  :=  x.next; 
goto  Li, 
x  =  nil  =>■  halt 
end 

A  shape  analysis  such  as  those  in  [Berdine  et  al.,  2007,  Gotsman  et  al.,  2007,  Distefano 
and  Parkinson,  2008]  might  discover  an  invariant  at  Lx  similar  to  the  one  below,  where 
Is  (a,  x,  y)  is  the  list  segment  predicate  defined  on  page  69. 

3a,  b.x'.  ls( a,x',x)  *  /s(b,x,  nil) 

This  describes  the  shape  of  the  heap  (there  are  two  linked  list  segments  with  x  pointing  to 
the  head  of  the  second  segment)  but  includes  no  information  about  data  structure  sizes  (the 
size  information  is  existentially  quantified).  We  will  call  analyses  producing  invariants 
such  as  this  slmpe-focused  analyses  in  recognition  of  the  fact  that  they  focus  on  shape 
invariants  and  support  little,  if  any,  reasoning  about  size  (some  analyses  do  keep  limited 
size  information  by  tracking  whether  a  data  structure  is  empty). 

We  can  use  the  addition  of  extra  variables  and  Corollary  2  to  generate  invariants  that  are 
more  precise  than  those  generated  by  a  shape-focused  analysis.  In  the  following  program 
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we  have  included  statements  modifying  variables  a  and  b  (we  will  show  how  to  generate 
such  a  program  in  Chapter  4  and  how  to  automate  this  process  in  Chapter  5). 

pi  def 

L0  :  a  :=  0;  b  :=  n;  goto  Li 
Li  :  branch  x  G  nil  =>■ 

x  :=  x.next; 
a  :=  a  +  1; 
b  :=  b  —  1; 
goto  Li, 
x  =  nil  halt 
end 

We  have  the  following  relationship  between  P  and  P' . 

traces([P  \  ls( n,x,  nil)))  ~={x}  traces([P'  \  ls( n,x,  nil))) 

Note  that  the  precondition  assumes  the  existence  of  a  program  variable  n  which  initially 
contains  the  length  of  the  list  at  x.  We  can  prove  that  the  following  LTSL  property  holds 

of  ((P'  |  /s(n,x,  nil))). 

G^af/oc(Li)  D  (3x'.  (/s(a,x',x)  *  /s(b,x,  nil))  A  a  +  b  =  n)  j 

By  Corollary  2  we  then  have  that  the  following  property  holds  of  ((P  |  Is ( n .  x.  nil))). 

G  [atloc{\-  i)  3  (3a,  b,x'.  ((s(a,x',x)  *  /s(b,x,  nil))  A  a  +  b  =  n)  j 

The  invariant  at  G  now  expresses  that  the  sum  of  the  lengths  of  the  list  segments  (a  +  b) 
is  always  equal  to  n. 

In  Chapter  5  we  will  show  that  by  using  this  approach  to  verification,  we  can  easily 
extend  a  shape-focused  analysis  to  an  analysis  that  also  supports  reasoning  about  integer 
invariants.  Furthermore,  we  can  decompose  the  verification  process  in  a  way  that  allows 
the  integer  reasoning  to  occur  independently  of  the  shape  reasoning. 
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3.4  Stuttering  Simulation 

In  the  previous  sections,  we  presented  some  examples  of  programs  that  produce  stuttering 
equivalent  traces,  as  well  as  programs  whose  trace  sets  obey  a  stuttering  containment  re¬ 
lation.  But  we  have  not  shown  how  to  prove  that  the  trace  set  of  one  program  stuttering 
contains  that  of  another.  In  this  section,  we  introduce  the  concept  of  stuttering  simulation 
relations  and  show  how  these  can  be  used  to  prove  that  one  program  is  an  abstraction  of 
another  with  respect  to  some  equality  relation  on  states.  The  definition  below  is  based 
on  Definition  4  from  [Manolios,  2001]  and  corresponds  to  the  concept  of  well-founded 
simulation  (the  well-foundedness  referring  to  the  rank  functions  that  are  involved  in  the 
definition). 

1  2 

Definition  29.  Given  transition  systems  S i  =  (A^,  1 1,  Fx ,  --->)  and  S2  =  (A2, I2,  F2, 
we  say  that  S2  E-stuttering  simulates  S±  iff  there  exists  a  relation  R  between  the  states  of 
.S',  and  S2  that  satisfies  the  following  conditions 

1.  (Initial  States  Related) 

Vai  G  I\.  3a2  G  I2 .  a.\  R  a2 


2.  (E -equivalent)  Vai,  a2.  (ai  R  a2 )  (ai  E  a2 ) 

3.  (Transitions  Match)  There  exist  ranking  functions  rankt  :  A2  x  A2  — )■  N  and 
rankl  :  A2  x  A\  x  A\  — >  N  such  that  for  cdl  ai,  a2,  if  a  i  R  a2  and  a\  --■>  a\  then 
one  of  the  following  holds: 

2 

(a)  (S2  Matches)  3 a'2.  (a2  --->  a'f)  A  (a)  R  a'2) 

(b)  (S\  Stutters)  (a\  R  a2)  A  (rankt (a[,  a2)  <  rankt(ai,a2 )) 

( c)  (S2  Stutters) 

2 

3a'2.  (a2  --->  a'f)  A  (ai  R  a2)  A  (rankl(a2,  ai,  off)  <  rankl (a2,  a\,  a[)) 

4.  (Final  States  Related)  If  a  \  R  a2  then  a  \  G  fi  ^  a2  G  F>. 
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We  call  R  an  E-stuttering  simulation  relation  and  write  S\  E  S2  to  indicate  that  R  is 
an  E-stuttering  simulation  relation  relating  Si  and  S2.  We  will  also  state  the  existence  of 
such  an  R  using  the  phrase  “S-2  E-stuttering  simulates  Si  ”. 

Note  that  the  definition  allows  three  types  of  behavior  when  Si  can  take  a  step  (con¬ 
ditions  3a,  3b,  and  3c).  The  first  corresponds  to  the  standard  requirement  of  simulation 
relations  and  specifies  that  the  transition  system  on  the  right  can  match  the  step  that  the 
system  on  the  left  makes.  The  second  and  third  conditions  are  what  classifies  this  def¬ 
inition  as  stuttering  simulation.  These  conditions  allow  for  cases  where  only  one  of  the 
systems  takes  a  step.  In  such  cases  the  system  making  the  transition  is  said  to  “stutter,” 
since  the  pre-  and  post-states  of  the  transition  are  both  ^-equivalent.  Thus,  the  state  is 
repeated  (with  respect  to  the  equivalence  E),  which  is  the  connection  with  the  common 
usage  of  “stutter”  as  the  generation  of  repeated  words  or  sounds.  We  include  the  conditions 
involving  rankt  and  rankl  to  ensure  that  one  system  cannot  stutter  infinitely. 

Given  this  definition  of  stuttering  simulation,  we  can  obtain  the  following  theorem, 
which  tells  us  that  stuttering  simulation  implies  stuttering  trace  containment.  The  fact  that 
we  prohibit  infinite  stuttering  is  important  here,  as  this  theorem  would  not  hold  without 
this  restriction. 

Theorem  18.  IfBR.  S  £ RE  S'  then  traces(S)  <E  traces(S'). 

Proof,  (adapted  from  the  proof  of  Proposition  1  in  [Manolios,  2001])  We  assume  that 
3R.  S  £ ,RE  S'  and  S  =  (A,  I ,  F,  — -»)  and  S'  =  (A' ,  I' ,  F' , — ■»').  We  must  show  the 
following. 

VT  G  traces(S).  3T'  G  traces  (S').  T  T’ 

The  definition  of  (Def.  19)  states  that  this  is  equivalent  to  the  following. 

VT  G  traces(S).  3 T'  G  traces(S').  3 a,/3.  matches(T,T' ,a,  f3,  E) 

We  will  assume  T  G  traces(S)  and  give  a  definition  of  T'  such  that  T'  G  traces(S')  and 
the  following  holds 

3 a,  (3.  matches(T,T' ,a, /3,  E)  (3.25) 
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As  we  produce  T',  we  also  define  a  and  (3.  Recall  that  a  and  3  partition  T  and  T'  re¬ 
spectively  into  blocks  of  elements  which  are  A-cquivalcnt.  Recall  also  that  a(i)  gives  the 
index  of  the  start  of  block  i  in  trace  T  (and  similarly  for  3  and  T').  Formally,  we  must 
provide  an  a  and  3  satisfying  the  following  (obtained  by  expanding  (3.25)  according  to 
Definition  18). 


a(0)  =  P{0)  =  0 

Vi,  j,  k.  a(i)  <  j  <  a(i  +  1)  A  /3(i)  <  k  <  (3(i  +  1)  =>• 

(j  <  len(T)  ^k<  len{T'))  A  (j  <  len(T )  =>  (T(j))  E  ( T{k ))) 


(3.26) 

(3.27) 


The  definition  of  a  and  3  is  by  recursion  on  the  block  number.  We  assume  we  are 
given  a(i),  /?(*),  and  from  these  define  a(i  +  1)  and  3(i  +  1).  We  also  assume  that  if 
at(i)  <  len(T )  then  we  are  provided  with  T'(/3(i))  such  that  (T(a(i)))  R  (T(/3(i))).  If 
a(i)  <  len{T)  we  also  build  the  ith  block  of  T' — that  is,  we  define  the  elements  T'(k) 
where  /3(i)  <k<  3(i  +  1)-  These  are  defined  so  as  to  establish  (3.27)  for  block  i,  which 
can  be  split  into  the  following  two  implications. 

Vj,  k.  a(i)  <j<  a(i  +  1)  A  0(i)  <  k  <  3(i  +  1)  =► 

(^3.2oJ 

{j  <  len(T ))  (k  <  len{T)) 

Vj,  k.  a(i)  <j<  a(i  +  1)  A  3(i)  <  k  <  3{i  +  1)  =>  ^ 

(j  <  fenCO  =►  (T(j 0)  E  (: T\k ))) 


Finally,  if  a(i  +  1)  <  len(T )  then  we  define  T'(3(i  +  1))  such  that  it  satisfies 
(T(a(i  +  1)))  R  (T'(3('i  +  1))),  thus  ensuring  that  the  assumptions  for  generating  the 
next  block  hold.  We  give  a  pictorial  overview  of  the  proof  setup  in  Figure  3.9. 


Base  Case  We  start  with  the  base  case  for  T',  a,  and  3 ■  Condition  (3.26)  requires  us  to 
set  a(0)  =  0  and  /3(0)  =  0.  Next  we  define  T'( 0)  given  T( 0).  We  have  from  S  £ RE  S' 
that  Vo  G  I.  3 a'  e  a  R  a'.  Since  T  e  traces(S)  we  have  that  T( 0)  G  I.  Thus, 
3a'  G  I'.  T(0)  R  a'.  We  set  T'( 0)  equal  to  this  o',  thus  giving  us  (T(0))  R  (T'( 0)). 
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T 


T 


a(i) 


v(i  +  1) 


R 


all  pairs 
F- related 


R 


0(i 


- XD - XD— - >cf  •  •  • 

I  f3(i  + 1) 


Figure  3.9:  Pictorial  overview  of  the  proof  of  Theorem  3.9.  The  picture  depicts  how  we  build  up 
T' ,  a,  and  8.  Solid  elements  of  the  figure  are  given.  These  include  a(i),  8(i).  the  elements  of  T 
and  the  fact  that  T(a(i ))  R  T' (/?(*)).  The  dashed  elements  arc  defined  /  proved  in  terms  of  these 
givens.  Definitions  must  be  provided  for  a(i  + 1),  (3(i  + 1),  and  the  elements  of  T'  from  index  j3{i) 
to  j3(i  +  1).  It  must  then  be  proved  that  ( T(a(i  +  1)))  R  ( T'(/3(i  +  1)))  and  that  (T(a)  R  T'(b )) 
for  all  a,  b  such  that  a(i)  <  a  <  a(i  +  1)  and  j3{i)  <b<  /3(i  +  1). 


Recursive  Case  We  break  the  proof  of  the  recursive  case  into  three  sub-cases:  ei¬ 
ther  a(i)  <  len(T)  —  1  (the  trace  T  contains  at  least  two  elements  starting  at  a{i)) 
or  a(i)  =  len(T )  —  1  (the  element  at  o(i)  is  the  last  element  in  the  trace  T )  or 
a(i)  >  len(T)  —  1  (the  index  a(i)  is  past  the  end  of  the  trace  T). 

CASE  1  [ a(i )  =  len(T )  —  1]  If  a(i)  is  the  index  of  the  last  element  in  the  trace  T,  then 
we  make  T'  end  at  8(i).  The  constraints  on  well-formed  traces  ensure  that  since  a(i)  is  the 
index  of  the  last  element  in  T,  we  have  T(a(i ))  £  F.  From  condition  4  in  the  definition 
of  simulation,  and  the  fact  that  R  ( T'(/3(i ))),  we  have  that  T'(/3(i))  £  F',  which 

ensures  that  taking  T'(/3(i ))  to  be  the  last  element  of  trace  T'  results  in  a  well-formed 
trace.  We  set  a(i  +  1)  =  a(i)  +  1  and  /3(i  +  1)  =  /3(i)  +  1.  We  now  must  check 
(3.28)  and  (3.29).  We  have  (T(a(i))  R  ( T'((3(i )))  which,  by  condition  2  of  Definition  29, 
implies  (T(a(i))  E  (T'(/3(i))).  This  establishes  (3.29).  For  equation  (3.28),  we  note  that 
a(i)  <  len(T)  and  (3{i)  <  len{T’)  while  a{i  + 1)  >  len(T )  and  +  >  len(T').  This, 

combined  with  the  fact  that  a(i  +  1)  =  a(i)  +  1  and  f3(i  +  1)  =  /3(i)  +  1  is  sufficient  to 
establish  (3.28). 
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CASE  2  [a(i)  >  len(T )  —  1]  In  this  case,  we  cannot  satisfy  the  antecedent  j  <  len(T )  in 
3.29.  Thus,  that  formula  holds  vacuously.  Our  rule  above  for  ending  the  trace  V  ensured 
that  a(i)  >  len{T )  /3(i )  >  (en(T'),  so  we  can  establish  3.28  regardless  of  what  a(i  +  l) 
and  [j{i  +  1)  are  set  to  (j  and  k  in  that  formula  will  both  index  past  the  end  of  the  trace). 
Essentially,  we  are  past  the  end  of  both  traces,  so  the  values  of  a  and  /3  at  this  point  are  not 
relevant.  Since  we  are  free  to  set  them  to  any  values  provided  the  functions  remain  strictly 
increasing,  we  choose  a(i  +  1)  =  a(i )  +  1  and  /3(i  +  1)  =  /3(i)  +  1. 

CASE  3  [a(i)  <  len(T)  —  1]  If  T  contains  at  least  two  elements  at  a(i),  then  we  have 
T(a(i))  —  ■»  T(a(i)  +  1).  Since  we  also  have  S  £ RE  S'  and  ( T(a(i ))  R  T'(/3(i))),  then 
by  Definition  29,  we  know  that  either  condition  3a,  3b,  or  3c  holds.  We  now  case  split  on 
these  possibilities. 

CASE  3.1  [Condition  3a  (S'  Matches)]  In  this  case,  we  have  that  there  exists  an  a'  such 
that  (T'(/3(i))  — ■>'  a')  A  ( T(a(i )  +  1)  R  a').  Since  each  transition  system  takes  a  step  to 
new  states  which  are  related,  we  start  a  new  block  in  each  trace.  We  set  a(i  + 1)  =  a(i)  + 1 
and  f3(i  +  1)  =  f3(i )  +  1.  We  set  T'(/3(i  +  1))  =  a'.  Applying  these  definitions  to 
T(a(i )  +  1)  R  a',  we  obtain  ( (T(a(i  +  1)))  R  (T'(/3(i  +  1))).  Note  that  T(a(i ))  and 
T'(/3(i ))  are  the  only  elements  in  the  ith  block  of  T  and  T' ,  respectively.  We  also  have 
(T(a(,  )))  R  (T'(/3(i ))),  and  that  /f-rclation  implies  ^-equivalence  (condition  2  of  Defi¬ 
nition  29).  These  facts  together  are  sufficient  to  prove  (3.29).  Equation  (3.28)  follows  from 
the  fact  that  neither  T(o(i))  nor  T(/3(i ))  are  the  last  elements  in  their  respective  traces. 

CASE  3.2  [Condition  3b  (S'  Stutters )]  We  further  assume  that  condition  3a  does  not 
hold  (otherwise,  this  situation  would  be  handled  by  the  case  above).  In  this  case,  we  have 
(T(a(i)  +  1))  R  ( T'((3(i )))  and  rankt(T{a{i)-\-l),  T'(f3(i)))  <  rankt(T(a(i)),T'(/3(i))). 
We  will  consider  the  longest  sub-sequence  of  T  starting  at  index  a(i )  such  that  condition 
3b  holds  for  consecutive  elements,  but  condition  3a  does  not.  This  will  be  used  to  define 
the  ith  block  of  T' . 

Let  n  be  the  maximum  integer  such  that 

V(.  1  <  l  <  n  =>• 

(! T(a(i )  +  l ))  R  (T'(/3(i)))  A  ($af.  (T'(/3(i))  -V  a ')  A  (T(a(i)  +  l)  R  a'))  (3.30) 
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Note  that  n  >  1  since  the  above  holds  for  the  current  step  of  T.  Also,  n  must  be  finite  due 
to  the  well-foundedness  of  rankt.  We  set  a(i  +  1)  =  a(i)  +  n  + 1  and  /3(i  + 1)  =  /3(i)  +  1. 

The  value  of  T'(/3(i  +  1))  depends  on  whether  T(a(i )  +  n)  is  the  last  element  of  T. 

CASE  3.2.1  [ T(oi(i )  +  n)  is  the  last  element  of  T ]  In  this  case,  T'((3(i ))  will  be  the  last 
element  of  T'  and  we  proceed  as  in  CASE  1.  From  Definition  12  we  have  T(a(i)  +  n)  G  F. 

We  have  (T(a(i)  +  n))  R  ( T'(/3(i )))  from  (3.30).  By  condition  4  of  Definition  29  we  then 
have  T'(/3(i))  E  F'  and  thus  T'(/3(i))  is  a  valid  last  state  for  T ,  so  we  leave  T'  undefined 
past  f3(i).  We  set  a(i  +  1)  =  a(i)  +  n  +  1  and  /3(i  +  1)  =  /3(i)  +  1.  By  (3.30)  we  have 
( T(a{i )  +  /))  R  ( T\(3(i )))  for  1  <  l  <  n  and  thus  ( T(a(i )  +  /))  E  (T'(/3(i))),  thus 
satisfying  (3.29).  Equation  3.28  follows  from  the  fact  that  a(i)  +  n  is  the  last  index  of  T 
and  f3(i)  is  the  index  of  the  last  element  of  T' . 

CASE3.2.2  [T  (a  (i)+n)  is  not  the  last  element  of  T]  In  this  case,  we  let  a(i+l)  =  a(i)+n+ 1 
and  we  have  that  T(a(i )  +  n)  — ■>  T(a(i)  +  n  +  1).  By  (3.30)  and  the  maximality  of  n, 
we  have  that  the  consequent  of  (3.30)  does  not  hold  for  l  =  n  +  1.  Thus,  we  have  the 
following. 

-((W)+»  +  l))  fi(T'09«)))v 

(3a'.  (: T'(l3(i ))  — ►'  a)  A  (T(a(i)  +  n  +  1))  R  a)  (3.31) 

We  can  show  that  the  second  disjunct  must  be  the  one  that  holds.  Because  we  have 
( T(a(i )  +  n))  R  (T'(/3(i)))  and  T(a(i)  +  n)  — ■>  T(a(i)  +  n  +  1),  then  by  Defini¬ 
tion  29  either  3a,  3b,  or  3c  must  hold  for  the  transition  T(a(i)  +  n)  —  ->  T(a(i)  +  n  +  1) 
and  T'((3(i)). 

•  Condition  (3a)  corresponds  exactly  to  the  second  disjunct  in  (3.31). 

•  Condition  (3b)  contradicts  the  first  disjunct  in  (3.31),  from  which  we  conclude  that 
the  second  disjunct  must  hold  in  this  case. 

•  Condition  (3c)  cannot  hold.  If  it  did,  we  would  have  3a'.  T'(/3(i))  — ■>'  a'A(T(a(i)+n)Ra'), 
which  contradicts  (3.30). 
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Thus,  we  have 

(3a'.  ( T'(f3(i ))  — a')  A  (T(ct(i)  +  n  +  1))  R  a') 

Let  a'  be  the  element  described  by  the  formula  above.  We  set  ot(i  +  1)  =  a(i)  +  n  +  1 
and  set  (3{i  +  1)  =  f3(i)  +  1.  We  set  T\f3{i  +  1))  =  a'.  We  have  (3.28)  since  neither 
sequence  is  ending.  We  have  (3.29)  from  assumption  (3.30)  and  the  fact  that  /(-relation 
implies  /^-equivalence.  We  have  ( T(a(i  +  1)))  R  (T'(/3(i  +  1)))  from  the  assumption 
that  ( T{a{i )  +  n  +  1))  R  a'. 

CASE  3.3  [Only  condition  3c  ( S'  Stutters)  applies]  This  proceeds  similarly  to  CASE  3.2. 
We  again  consider  a  maximal  sequence  (maximal  with  respect  to  prefix  order)  where  only 
condition  3c  applies.  Formally,  T"  is  a  maximal  sequence  with  T"( 0)  =  T'(/3(i ))  such 
that 

Vj.  0  <  j  <  len(T ")  =*  ((/>(*)))  R  (T"(j)))  (3.32) 

and 

Vj.  0  <j<  ( len(T ")  -  1)  ^  (T"(j)  T'\j  +  1))  (3.33) 

and  for  each  j  such  that  0  <  j  <  ( len(T ")  —  1)  we  have 

$a.  (: T(a(i ))  — >  a)  A  a  /?  (T"{j  +  1))  (3.34) 

(which  states  that  condition  3a  does  not  hold)  and 

$a.  ( T(a(i ))  —*  a )  A  a  R  (' T"(j )) 

(which  states  that  condition  3b  does  not  hold).  There  may  be  several  choices  for  the 
sequence  T" .  Any  choice  satisfying  the  stated  conditions  is  acceptable. 

Note  that  T"  contains  at  least  two  elements  since  condition  3c  (the  assumption  in 
this  case)  states  that  there  is  an  a'  such  that  ( T'(/3(i ))  a')  A  ( T(a(i ))  R  a').  This 

implies  that  there  is  a  sequence  satisfying  these  conditions  with  T"( 0)  =  T'(/3(i ))  and 
T"(  1)  =  a! .  Let  n  +  1  be  the  length  of  this  sequence  (thus  making  T"(n)  the  last  element 
in  the  sequence). 
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We  have  (T(a(i)))  R  ( T"{n ))  from  (3.32)  and  we  have  T(a(i ))  — ■>  T(a(i)  +  1) 
due  to  the  fact  that  we  are  in  CASE  3.  Thus,  condition  3  of  Definition  29  states  that  either 
condition  3a,  3b,  or  3c  holds  for  the  transition  T(a(i))  — ■»  T(a(i)  +  1)  and  T"{n). 

Due  to  the  maximality  of  T",  we  cannot  have  that  only  condition  3c  holds.  If  this  were 
the  case,  then  we  would  have  T”(n)  a'  for  some  a'  and  T"  could  be  extended  by 
setting  T”(n  +  1)  =  a',  thus  contradicting  the  maximality  of  T" . 

Condition  3b  also  cannot  hold.  Suppose  it  did.  Then  we  would  have 

T(a(i))  —*  T(a(i)  +  1)  and  (T(a(i)  +  1))  R  (: T"{n )) 

Since  we  already  have  R  ( T"{n  —  1))  and  T"{n  —  1)  — ■»'  T"(n )  by  (3.32)  and 

(3.33),  this  implies  that  condition  3a  holds  of  the  transition  T{a{i ))  --->  T{a{i )  +  1)  and 
T”(n  —  1).  This  contradicts  (3.34). 

Thus,  3a  must  hold  for  T(a(i))  — ■»  T(a(i)  +  1)  and  T"(n),  implying  that  there  is  a  b 
such  that  T"{n )  b  and  T(a(i)  +  1)  R  b.  We  handle  this  case  similarly  to  CASE  3.1. 
We  set  a(i  +  1)  =  a{i)  +  1  and  f3(i  +  1)  =  /3(i )  +  n  +  1.  We  let  T'(j )  =  T"(j  —  /3(i )) 
for  0  <  j  <  n.  We  set  T'(/3(i  +  1))  equal  to  b.  Since  T  contains  elements  at  least  through 
index  a(i  +  1)  and  V  contains  indices  at  least  through  f3(i  +  1),  we  have  (3.28).  From 
3.32  and  the  fact  that  /(-relation  implies  /^-equivalence,  we  have  (3.29).  We  also  have 
T(a(i  +  1))  R  b  which  implies  ( T(a(i  +  1)))  R  (T'(/3(i  +  1))),  completing  our  proof 
requirements.  □ 


Simulation  gives  us  a  method  of  proving  /^-stuttering  trace  containment  that  only  in¬ 
volves  examining  local  transitions.  Stuttering  simulation  is  a  stronger  property  than  stut¬ 
tering  trace  containment  and  actually  preserves  all  ACTL*\X  properties  [Manolios,  2001]. 
Though  we  are  only  interested  in  LTSL,  which  is  a  subset  of  ACTL*\X,  we  will  never¬ 
theless  use  stuttering  simulation  as  our  main  proof  method,  as  its  local  character  makes 
reasoning  much  easier. 
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3.5  Properties  of  Interest 

While  we  have  shown  that  stuttering  equivalence  preserves  all  LTSL  properties,  there  are 
certain  specific  properties  that  we  will  focus  on  in  our  examples  and  experiments. 

Definition  30. 

1.  A  program  P  is  safe  iff  P  \=x  ~(F (err)). 

2.  A  program  P  is  terminating  iff  P  \=x  F (  final  V  err). 

3.  A  formula  Q  is  invariant  for  P  at  l  iff  P  \=x  G  (atloc(l)  D  Q). 

4.  An  expression  e'B  bounds  an  expression  e1  iff  P  \=x  G(e1  <  e'B). 

In  less  formal  terms,  the  safe  property  states  that  the  execution  state  error  is  never 
reached.  The  terminating  property  holds  exactly  when  the  program  has  no  infinite  traces. 
The  reason  this  statement  is  equivalent  to  the  LTSL  formula  given  above  is  that  neither 
of  the  states  error  nor  final(s,  h)  can  ever  make  a  transition.  Thus,  any  trace  containing 
error  must  be  a  finite  trace  with  final  state  error  (and  similarly  for  final(s,  h)). 

The  invariant  at  l  property  holds  exactly  when  Q  is  an  invariant  at  location  l.  This 
means  that  whenever  the  program  jumps  to  label  /,  the  current  store  and  heap  satisfy  Q. 
The  bounds  property  states  that  at  every  step  in  the  execution  of  program  P,  the  value  of 
the  expression  e'B  (as  evaluated  in  the  current  state)  is  greater  than  or  equal  to  the  value 
of  the  expression  e1  (in  other  words,  eB  is  an  upper  bound  of  e1).  In  general,  when  we 
consider  bounds  we  will  be  interested  in  finding  a  bound  for  a  variable  in  terms  of  specific 
other,  designated  values.  For  example,  we  may  be  interested  in  finding  a  bound  on  the  size 
of  a  function’s  outputs  in  terms  of  its  inputs. 
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Chapter  4 

Instrumented  Programs 


The  translation  from  heap-manipulating  programs  to  numeric  abstractions  proceeds  via  an 
intermediate  step  that  we  call  instrumented  programs.  These  are  programs  that  include  the 
original  program  commands  along  with  commands  that  update  a  set  of  instrumentation 
variables  V,  drawn  from  a  set  that  is  disjoint  from  the  set  of  program  variables.  The  addi¬ 
tional  commands  describe  how  numeric  counts,  such  as  the  size  of  a  data  structure,  change 
during  execution  of  the  program.  We  call  such  additional  commands  instrumentation  com¬ 
mands.  The  instrumentation  commands  are  added  to  the  instrumented  program  as  a  proof 
of  memory  safety  is  constructed  and  make  use  of  the  intermediate  results  of  this  safety 
analysis.  Once  the  instrumented  program  has  been  constructed,  the  numeric  abstraction  is 
extracted  from  it  by  a  simple  syntax-directed  translation.  This  step  is  discussed  in  Section 
4.4.  The  end  result  is  that  the  numeric  abstraction  =y> -stuttering  simulates  the  original 
program,  where  V'  is  a  subset  of  the  program  and  instrumentation  variables  that  depends 
on  the  details  of  the  construction  of  the  abstraction.  This  results  in  a  numeric  abstraction 
that  is  sound  for  both  safety  and  liveness  properties  over  variables  in  V' . 
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4.1  Theory 


Informally,  an  instrumented  program  for  program  P  is  a  program  P  that  contains  all  the 
commands  and  control-flow  of  P,  but  with  the  addition  of  some  commands  and  branches 
that  make  use  of  a  set  of  instrumentation  variables  that  are  separate  from  the  program 
variables.  These  instrumentation  variables  play  a  role  similar  to  that  of  auxiliary  variables 
in  program  logics  for  concurrency  [Owicki  and  Gries,  1976]. 

In  Figure  4.2  we  give  a  set  of  inference  rules  for  establishing  the  judgment  T  b  P  ►  y  P 
which  is  read  “P  is  an  instrumented  version  of  P”  and  also  explicitly  lists  V,  the  set  of 
instrumentation  variables  and  F,  a  mapping  from  labels  to  separation  logic  formulae  that 
specifies  program  invariants  for  each  label.  This  judgement  is  intended  to  capture  the  fact 
that  P  simulates  P  when  both  are  started  from  states  satisfying  T (initloc(P))  (the  invari¬ 
ant  for  the  initial  location).  The  soundness  theorem  for  the  system,  proved  in  Section 
4.3,  states  that  the  proof  rules  described  in  this  chapter  do  ensure  the  existence  of  such  a 
simulation. 

Figure  4.1  defines  a  similar  judgment  at  the  level  of  continuations.  The  judgment  for 
continuations,  which  has  the  form  T  h  {Q}  k  ►y  k,  should  be  provable  only  if,  when 
started  from  a  state  satisfying  Q,  the  continuation  k  simulates  the  continuation  k.  For 
continuations,  this  simulation  means  that  k  can  match  any  transition  k  makes  and  the 
continuations  eventually  either  both  halt,  both  reach  an  error,  or  both  jump  to  the  same 
label. 

The  simulation  relation  we  obtain  in  Section  4.3  enforces  a  relationship  between  the 
memory  states  of  the  two  programs.  The  instrumented  program  P  modifies  variables  in 
V,  but  the  original  program  P  does  not.  The  simulation  relation  ensures  that,  despite 
these  extra  commands  involving  new  variables,  for  every  execution  trace  T  of  the  original 
program,  there  is  a  matching  execution  trace  T'  in  the  instrumented  program  such  that  T 
and  T'  agree  on  the  values  of  the  non-instrumentation  variables  (that  is,  all  variables  in 
the  original  program).  This  connection  lets  us  check  properties  of  P  by  instead  checking 
them  on  P.  For  example,  if  a:  is  a  program  variable  and  x  is  never  assigned  the  value 
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0  in  executions  of  P  then  we  can  conclude  that  it  is  also  never  assigned  the  value  0  in 
executions  of  P. 

Note  that  the  property  of  being  a  valid  instrumentation  is  defined  with  respect  to  pro¬ 
gram  invariants  T  and,  in  the  case  of  continuations,  with  respect  to  a  precondition  0.  If 
we  view  the  construction  of  a  proof  in  the  system  given  in  Figure  4.1  as  proceeding  in 
a  bottom-up  manner,  then  instrumentation  proceeds  in  lock-step  with  the  derivation  of 
a  partial  correctness  proof  of  the  program.  The  rules  Command  and  Branch  tell  us 
how  to  update  the  precondition  to  reflect  the  results  of  executing  an  existing  command 
and  rules  Inst-Assign,  Inst-Disj,  Inst-Exists,  and  Inst-Assume  tell  us  which  new 
commands  can  be  inserted.  The  triple  {Q}  c  {()'}  in  the  Command  rule  is  a  partial 
correctness  triple  and  holds  iff 

Vs,  h.  ((s,  h)  |=  Q)  =>  (error  £  ([c]  (s,  h)))  A  (V(s',  h')  e  ([c]  (s,  h)).  (s',  ti)  \=  Q') 

Note  that  such  triples  can  be  found  only  if  c  is  memory  safe  under  precondition  Q  (this 
is  required  due  to  the  clause  error  ^  ([c]  (s,  h))  and  the  fact  that  error  is  the  result  of 
any  command  that  violates  memory  safety).  For  this  reason,  the  rules  in  Figures  4. 1  and 
4.2  will  only  let  us  derive  instrumented  versions  of  a  program  if  the  original  program  is 
memory  safe. 

A  key  difference  between  this  approach  to  command  insertion  and  the  auxiliary  vari¬ 
able  approach  lies  with  the  Inst-Exists  rule.  This  rule  tells  us  that  if  we  insert  an  as¬ 
signment  x  :=  ?,  then  we  can  remove  an  existential  quantifier  on  x.  This  may  seem  odd, 
since  {3.x.  0  }  x  :=  ?  {0}  is  not  a  valid  partial  correctness  triple.  However,  inserting  such 
a  command  and  reasoning  from  the  unquantified  formula  is  sound  because  our  soundness 
result  is  based  on  simulation.  To  maintain  soundness,  we  must  show  that  if  the  original 
program  can  take  a  step,  then  there  exists  a  step  in  the  instrumented  program  that  takes  us 
to  a  related  state.  The  fact  that  the  semantics  of  x  :=  ?  includes  all  possible  updates  to  x 
allows  us  to  find  such  a  step.  Similarly,  the  Inst-Disj  rule  allows  us  to  reason  separately 
about  each  side  of  a  disjunction.  Again,  this  is  valid  because  we  are  targeting  a  correspon¬ 
dence  between  the  two  programs  that  is  based  on  simulation.  We  say  more  about  these 
connections  in  Section  4.7. 
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Halt  Abort 


r  b  {g}  halt  ►  y  halt  V  b  {g}  abort  ►y  abort 


Goto 

m  =  q 

r  b  {Q}  gOtO  /  ►y  gOtO  l 


Command 

{Q}  c  { Q'}  T  b  {Q'}  k  ►y  k 


r  b  {Q}  (c;  b)  ►y  (c;  A;) 


Strengthening 
Q  Q'  T  b  {Q'}  k  ►y  A; 

r  b  {Q}  A?  ►y  A; 


Branch 

Vi.  (r  b  {Q  A  eb}  ki  ►y  A;*) 

T  b  {g}  branch  . . . ,  e}  =>-  ki, . . .  end  ►y  branch  . . . ,  eb  =>-  ki, . . .  end 


False 


T  b  {false}  halt  ►y  k 


Inst-Assign 

{Q}  xT  :=  eT  {Q'}  T  b  {g'j  k  ►y  k 
r  b  {g}  (xT  :=  eT;  A;)  ►y  A; 


xT  eV 


Inst-Disj 

r  b  {gj  fci  ►y  k  r  b  {g2}  aT2  ►y  k 

r  b  {gx  V  g2}  branch  true  A^true  =4-  k2  end  ►y  k 


Inst-Exists  Inst-Assume 

r  b  {g}  k  ►y  k  Q=>eb  r  b  {g}  k  ►y  k 

- ^ - xT  &V  - ^ - 

T  b  {3a;T.  g}  (xT  :=  ?T;  A;)  ►y  k  Y  b  {g}  assume(eb) ;  k  >-y  k 

Figure  4.1:  Rules  for  establishing  that  r  b  {g}  k  ►y  k,  read  “under  precondition  Q,  with  label 
invariants  F.  the  continuation  k  is  an  instrumented  version  of  k  with  instrumentation  variables  V.” 
Premises  of  the  form  {g}  c  { Q' }  arc  partial  corTectness  triples  and  hold  iff  for  all  s,  h,  (s,  h)  j=  Q 
implies  (V(V,  h')  G  ([cj  ( s ,  h)).  (s' ,  h!)  |=  Q').  Premises  of  the  form  Q  =4>  Q'  hold  iff  Q  =G-  Ql  is 
valid  (that  is,  (s,  h )  |=  ( Q  =G-  Q')  for  ah  s,  h ). 
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Inst-Prog 
fv{P)  n  v  =  0 


dom(P)  =  dom(P) 

initloc(P)  =  initloc(P)  VI  G  dom(P).  (r  P  {r(i)}  P(l)  ►y  P(Z)) 
Th  P  ►y  P 


Figure  4.2:  Rule  for  proving  that  P  is  an  instrumented  version  of  P.  The  function  fv(P)  gives  the 
set  of  variables  occurring  free  in  P.  Since  there  are  no  binding  constructs  in  our  language,  this  is 
just  the  set  of  all  variables  appealing  in  P. 


Notation  As  before,  we  will  use  circled  numbers  to  label  continuations  in  our  examples. 
To  help  distinguish  between  the  instrumented  program  and  the  original  program,  we  will 
adopt  the  convention  of  using  black  numbers  in  white  circles  (  (T),  (2), . . .  )  to  represent 
control  points  in  the  original  program  and  white  numbers  in  black  circles  (  ©0  ) 

to  represent  control  points  in  the  instrumented  program.  We  will  also  assign  numbers 
such  that  if  the  original  program  contains  a  continuation  labeled  (2)  and  the  instrumented 
program  contains  a  continuation  labeled  ©  then  we  will  have  T  h  {()}  ©  ►y  (2)  for 
some  T,  V,  and  ().  Intuitively,  this  indicates  that  the  control  points  (2)  and  ©  are  related 
by  the  simulation  relation  used  to  demonstrate  soundness. 


4.1.1  Common  Cases 

The  rules  Inst- Assign,  Inst-Disj,  Inst-Exists  and  Inst-Assume  allow  us  to  ex¬ 
presses  various  facts  about  the  behavior  of  numeric  properties  of  data  structures.  These 
facts  generally  fall  into  four  categories. 
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Deterministic  Size  Changes 

We  can  record  deterministic  size  changes  using  the  Inst-Assign  rule.  Suppose  we  have 
the  following  definition  of  singly-linked  list  segments. 

ls(n,  start,  end )  = 

(emp  A  start  =  end  A  n  —  0) 

V  (n  >  0  A  (3 z.  ( start  (->•  [next  :  z])  *  ls{n  —  l,z,  end))) 

and  execute  the  code  given  below. 

Li  :  (T)  branch  x  ^  nil  =>  (2)  x  :=  x.next;  (3)  goto  Lx, 
x  =  nil  =>■  (4)  halt  end 

An  invariant  of  this  code  at  label  Li  is  3ni,n2,x'.  ls(ni,x',x)  *  ls(n2,x,  nil).  In  order 
to  track  how  the  sizes  of  the  segments  are  changing,  we  can  generate  an  instrumented 
program  for  the  code  above.  Let  r(Li)  =  3x'.  ls(ni,x',  x)  *  ls(n2,  x,  nil).  Then  the  code 
below  is  an  instrumented  version  of  the  code  above  with  instrumentation  variables  ri\  and 
n2  (the  assignments  to  ri\  and  n2  are  added  with  the  INST-ASSIGN  rule).  The  variable  n2 
tracks  the  quantity  “length  of  the  list  segment  from  x  to  nil”  and  ri\  tracks  the  quantity 
“length  of  the  list  segment  ending  at  x.” 

Li  :  Q  branch  x  ^  nil  =>■  Q  x  :=  x.next;  ^  n \  :=  n i  +  1  ; 

n2  :=  n2  —  1;  goto  Lx, 
x  =  nil  ^  ^  halt  end 

Note  that  the  existential  quantification  is  dropped  in  the  invariant  used  for  the  instru¬ 
mented  program  (in  r(Lx)  the  variables  ri\  and  n2  appear  unquantified).  This  is  possible 
because  we  are  now  updating  n  i  and  n2  in  the  body  of  the  loop.  Viewed  another  way,  it 
is  by  committing  to  an  invariant  in  which  ri\  and  n2  are  unquantified  that  we  are  forced  to 
write  the  appropriate  updates  to  ri\  and  n2  in  the  body  (if  we  update  ri\  or  n2  incorrectly, 
we  will  not  be  able  to  show  that  T(Li)  is  an  invariant).  Figure  4.3  gives  a  derivation  show¬ 
ing  that  the  instrumentation  we  presented  is  a  valid  instrumented  version  of  the  original 
program  according  to  the  rules  in  Figures  4.1  and  4.2. 
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T  (Li)  =  Q4 


{Q3}  n2  :=  n2  -  1  {Q4}  T  h  {Q4}  goto  Li  ^ni,n2  goto  Li 
T  F  {Q3}  n2  :=  n2  —  1;  goto  Lx  *ni,n2  goto  Lx 


Goto 

I-A 


{Q2}  ni  :=  m  +  1  {Q3}  T  h  {Q3}  n2  :=  n2  -  1;  goto  Li  ►ni,n2  goto  Li 


FF{Q2}0^ 


I-A 


ni,n2 


{Qi  Ax/  nil}  x  :=  x.next  {Q2}  F  F  { Q2 }  ©  ►  ni, 


n2 


r  h  {Qi  Ax/  nil}  0  ► 


Cmd 


ni,n2 


r  h  {Qi  Ax/  nil}  0  ►nli„2  (2)  r  h  {Q 1  A  x  =  nil}  halt  ►JH)Tl2  halt 

r  I-  {Qi}  O  ►m,n2  CD 

r(Li)  =  3x7.  Zs(ni,  x7,  x)  *  k(n2,  x,  nil) 

Qi  =  3x7.  ls(ni,  x' ,  x)  *  ls(n2,  x,  nil) 

Q2  =  3x'.  Zs(ni  +  1,  x7,  x)  *  /s(n2  —  1,  x,  nil) 

Q3  =  3x7.  ls(ni,  x' ,  x)  *  ls(n2  —  1,  x,  nil) 

Q4  =  3x7.  ls(ni,  x' ,  x)  *  ls(n2,  x,  nil) 


Halt 

Branch 


Figure  4.3:  Derivation  showing  an  instrumented  program  that  performs  a  deterministic  update  of  a 
variable  representing  the  length  of  a  linked  list.  I-A  stands  for  Inst-Assign. 
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Non-deterministic  Size  Changes 

Suppose  we  have  the  following  definition  of  a  binary  tree,  where  n  represents  the  number 
of  nodes  in  the  tree. 


tree(n,  r)  =  (n  —  0  A  r  —  nil) 

V  (n  >  0  A  3ni,  n2.  n  =  rii  +n2  +  l  A 

3 Ic,  rc.  r  H-  [left  :  Ic,  right  :  rc ] 
*  tree (ni,  Ic)  *  tree (n2,  rc)) 


If  we  now  consider  code  for  descending  through  the  tree,  we  can  obtain  update  commands 
similar  to  those  obtained  for  the  linked  list  example  above.  However,  when  a  pointer  p  is 
advanced  through  a  list,  the  change  in  the  size  of  the  list  at  p  is  deterministic  (it  always 
decreases  by  one).  In  the  case  of  trees,  if  some  pointer  p  descends  to  the  left  child,  we  do 
not  have  a  deterministic  function  that  describes  how  the  number  of  nodes  reachable  from 
p  changes.  Instead,  there  is  a  relation  between  the  two  quantities  which  specifies  that  the 
number  of  nodes  in  the  left  sub-tree  can  range  from  zero  to  one  less  than  the  number  of 
nodes  in  the  full  tree.  We  will  use  non-deterministic  assignment  to  capture  this  update 
relation. 

The  original  program  we  consider  is  given  below.  The  program  checks  whether  the  tree 
at  r  is  empty  and,  if  it  is  not,  it  non-deterministically  chooses  a  child  to  descend  to.  We 
have  marked  with  (T)  a  location  of  interest  during  creation  of  the  instrumented  program. 

L\  :  branch  r  ^  nil  =>■  (T)  branch  true  =>■  r  :=  r.left; 

goto  Lu 

true  =>r  :=  r. right; 
goto  Lx  end 

r  =  nil  =>■  halt  end 
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Let  r(Li)  =  (tree(n,  r)) *true  (where  true  is  used  to  capture  the  part  of  the  heap  no  longer 
below  r  in  the  tree)  and  let  Q  be  the  following  formula 

Q  =  (n  >  0  A  n  —  ni  +  n2  +  1)  A 

3 Ic,  rc.  r  ha  [left  :  Ic,  right  :  re]  *  tree(rii,  Ic )  *  tree(n2,rc)  *  true 

We  will  now  construct  an  instrumented  version  of  this  program  using  the  following  pro¬ 
cess,  obtained  by  taking  an  algorithmic,  bottom-up  reading  of  the  inference  rules  given  in 
Figure  4.1. 

1.  Start  with  the  continuation  at  Li  and  the  invariant  Y(L\). 

2.  Copy  commands  from  the  original  program  over  to  the  instrumented  program,  up¬ 
dating  the  current  invariant  using  the  rules  Branch  and  Command. 

3.  If  a  halt  or  abort  is  encountered,  then  we  can  stop  analyzing  this  branch. 

4.  If  a  goto  L  command  is  encountered,  then  we  insert  instrumentation  commands 
using  rules  Inst-Exists,  Inst-Assume,  and  Inst-Assign  in  order  to  establish  the 
invariant  r(L). 

This  process  is  not  general  enough  to  give  us  the  instrumentation  we  want  in  all  cases  (for 
example  it  will  never  insert  new  branches  using  the  Inst-Branch  rule)  but  it  will  suffice 
for  this  example.  We  give  a  more  general  procedure  in  Chapter  5. 

Following  steps  1  and  2  we  can  obtain  the  formula  3ni,n2.  Q  for  the  invariant  at  the 
position  labeled  with  (T)  in  the  original  program.  We  now  must  give  an  instrumentation  of 
each  case  of  the  branch  at  this  location.  Let  us  consider  first  the  case  that  chooses  the  left 
child.  This  case  executes  the  continuation  r  :=  r. left ;  goto  Z^.  A  valid  post-condition 
after  executing  r  :=  r.left  is  the  following 

Q'  =  3ni,  n2.  n  >  0  A  (n  —  n\  +  n2  +  1)  A 

3 r\  rc.  r'  ha  [left  :  r,  right  :  rc]  *  tree(ni,  r)  *  tree(n2,  rc)  *  true 
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We  now  need  to  add  instrumentation  commands  that  allow  us  to  re-establish  the  invariant 
r(Li)  which  is  ( tree(n ,  r))  *true.  The  commands  we  will  add  are  the  following,  which  are 
justified  using  the  Inst-Exists,  Inst-Assume,  and  Inst-Assign  rules.  A  full  derivation  is 
given  in  Figure  4.4. 

rii  :=  ?;  n2  :=  ? ;  assume(n  =  rii  +  n2  +  1) ;  n  :=  rii 
Executing  these  leads  us  to  the  invariant 

3 r',  rc.  r'  ha  [left  :  r,  right  :  rc]  *  tree(n,  r )  *  tree{n 2,  rc)  *  true 

which  is  labeled  Q2  in  Figure  4.4.  This  formula  implies  ( tree(n ,  r) )  *  true  which  is  T^). 
This  allows  us  to  finish  the  processing  of  this  branch  by  using  the  Strengthening  rule  to 
show  that  we  have  the  invariant  (tree(n,  r))  *  true  here.  As  this  is  equal  to  T ( L\  ),  this  lets 
us  use  the  Goto  rule  to  process  the  goto  L2  command. 

We  can  perform  the  same  analysis  of  the  branch  that  descends  into  the  right  sub-tree 
and  obtain  the  instrumentation  commands  below. 

rii  :=  ?;  n2  :=  ?;  assume(n  =  n\  +  n2  +  1) ;  n  :=  n2 

Putting  this  all  together,  the  full  instrumented  version  of  this  program  is  given  below. 

Li  :  branch  r  7^  nil  Q  branch  true  =>■  r  :=  r.left;  rii  :=  ?;  n2  ?; 

assume(n  =  rii  +  n2  +  1) ; 
n  :=  rii ;  goto  L1, 

true  r  :=  r.right;  rii  :=  ?;  n2  :=  ?; 
assume(n  =  rii  +  n2  +  1) ; 

n  :=  n2 ;  goto  L\  end, 

r  =  nil  halt  end 

Recall  that  we  generated  this  program  in  a  fairly  directed  manner.  We  copied  com¬ 
mands  from  the  original  program  into  the  instrumented  program  and  only  inserted  instru¬ 
mentation  commands  when  this  was  necessary  to  establish  an  invariant  in  T.  It  still  re¬ 
quired  some  ingenuity  to  derive  the  post-conditions  of  commands  and  determine  which 
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r  (Li)  =  Q3 


Q2^  Q3  r  h  {Q3}  goto  Li  ►n,ni,n2  g°tO  -C'l 
{Qi}  n  :=  ni  {Q2}  T  b  {Q2}  goto  Li 

^n,ni,n2  goto  Li 

r  h  {Qi}  n  :=  ni ;  goto  L,  ►n,nii„2  goto  Li 


r  h  {Qi}  assume(n  =  n3  +  n2  +  1);  n  :=  ni;  goto  Li  ►n,ni,n2  goto  £1 

# _  y 

T  h  {3n2.  Q 1}  assume(-n  =  ni  +  n2  -| _  1);  n:=n\  ;  goto  Li  ^n,ni’n2  g0t0 
T  h  {3ni,n2.  Qi}  assume^  n,  +  n2  +  1);  n  :=  m  *n’ni’n2  g°t0  Ll 


Goto 

Strengthen 
-  Inst-Assign 
Inst-Assume 


I-E 


I-E 


Q 1 


Q-2 

Q3 

r(Li) 


n  >  0  A  (n  =  ni  +  n2  +  1)  A 

3r' ,  rc.  r'  i->-  [left  :  r,  right  :  rc\  *  tree(n\,  r )  *  tree(n2,  rc)  *  true 
3 r1,  rc.  r'  1— >  [left  :  r,  right  :  rc]  *  tree(n,  r)  *  tree(ri2,rc )  *  true 
tree(n,  r)  *  true 
tree(n,  r)  *  true 


Figure  4.4:  Derivation  showing  that,  for  the  tree  traversal  program  on  page  136,  the  commands 
given  re-establish  the  invariant  r(Li).  We  write  I-E  as  an  abbreviation  for  Inst-Exists  and 
abbreviate  Strengthening  as  Strengthen. 


instrumentation  commands  to  insert  (although  the  former  could  be  handled  by  using 
strongest  post-conditions).  In  Chapter  5  we  will  describe  how  to  automate  all  portions 
of  the  instrumentation  process. 

Our  semi-automated  process  had  us  insert  instrumentation  commands  only  immedi¬ 
ately  before  goto  commands.  If  we  had  chosen  different  points  at  which  to  insert  the 
instrumentation  commands,  we  could  have  obtained  the  code  below,  which  places  the 
commands  that  affect  n \  and  n2  before  the  branch  instead  of  replicating  them  in  each 
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branch  case. 


Lx  :  branch  r  ^  nil  =>■  ri\  :=  ?;  n2  ?; 

assume(n  =  rii  +  n2  +  1) ; 

branch  true  =>■  r  :=  r.left; 

n  =  ri\ ;  goto  Li, 
true  =>•  r  :=  r. right; 

n  =  n2;  goto  Li  end 

r  =  nil  =>■  halt  end 

Both  this  code  and  our  previously  derived  code  are  valid  instrumentations  of  the  origi¬ 
nal  program,  as  can  be  verified  using  the  rules  in  Figure  4.1.  However,  the  second,  shorter 
program  may  be  easier  to  verify  using  automated  tools.  In  general,  the  less  statements, 
variables,  and  branching  a  program  contains,  the  easier  it  is  for  automated  tools  to  handle. 
We  say  more  about  this  in  Section  5.11,  which  discusses  our  experimental  results. 

Branch  Condition  Translation 

Let  us  return  to  the  linked-list  example  from  before.  The  instrumented  code  that  we  gen¬ 
erated  is  replicated  below. 

Li  :  Q  branch  x  ^  nil  =>■  Q  x  :=  a;. next;  0  n i  :=  ri\  +  1; 

ri2  ■=  n2  -  1;  goto  U, 
x  =  nil  Q  halt  end 

This  summarizes  how  ri\  and  n2  change  during  each  iteration.  Recall  that  n \  and  n2 
are  the  lengths  of  the  list  segments  in  the  invariant  3x'.  ls(ni,x',x )  *  ls(n2,x ,  nil).  The 
instrumentation  commands  in  the  program  above  are  sufficient  to  prove  some  properties 
of  the  list  lengths.  For  example,  we  can  show  that  the  sum  n i  +  n2  is  invariant  at  location 
L\.  However,  we  have  not  added  any  commands  to  indicate  how  ri\  and  n2  influence  the 
truth  of  the  branch  condition.  Thus,  though  we  would  like  to  use  n  \  and  n2  to  reason  about 
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termination  of  the  code,  we  cannot  obtain  a  ranking  function  because  n i  and  n2  are  not 
bounded. 

To  obtain  a  more  precise  numeric  abstraction  that  will  be  useful  for  termination  rea¬ 
soning,  we  need  to  notice  that  only  certain  values  of  n2  are  possible  when  the  branch 
condition  x  =  nil  is  true.  Similarly,  when  x  ^  nil  is  true,  this  also  gives  us  information  on 
the  possible  values  of  n2.  Specifically,  if  x  =  nil  then  n2  =  0  and  if  x  7^  nil  then  n2  >  0. 
To  record  this  information  and  make  it  available  to  subsequent  analyses,  we  can  use  the 
Inst-Assume  rule  to  insert  an  assumption  on  n2.  The  final  instrumented  program  then 
becomes  the  following. 

Lx  :  branch  x  ^  nil  =>■  assume(n2  >0);  x  x.next; 

n  1  :=  n ,  +  1;  n2  :=  n2  -  1;  goto  Li, 
x  =  nil  =*>  assume(n2  =  0) ;  halt  end 

It  is  now  clear  that,  for  any  n2,  the  program  terminates.  This  is  the  case  because  n2 
decreases  by  one  during  each  iteration  and  once  n2  =  0,  the  first  assume  statement  prevents 
us  from  executing  the  loop  body  again.  Values  of  n2  such  that  n2  <  0  are  not  possible 
as  the  two  assume  conditions  together  ensure  that  the  only  valid  executions  are  those  for 
which  n2  >  0  in  the  initial  state.  Ruling  out  the  states  where  n2  <  0  does  not  pose  a 
problem  for  soundness  since  the  precondition  3x'.  ls(ni,  x',  x)  *  ls(n2,  x,  nil)  implies  that 
n2  >  0. 

Alternate  Translation  We  could  also  have  inserted  a  branch  on  n2  using  the  Inst-Disj 
rule  and  then  pruned  inconsistent  cases  using  the  False  rule.  Recall  that  the  original  code 
was  as  below. 

Li  :  (T)  branch  x  7^  nil  =>■  (2)  x  :=  x.next;  (3)  goto  L2) 
x  =  nil  =>■  (4)  halt  end 

We  start  by  noting  that  T(Li)  =  3x'.  ls(ni,  x',  x)*ls(n2,  x,  nil)  and  this  implies  Q1VQ2 
where  Q 1  and  ()2  are  defined  as  follows. 

Q 1  =  3x'.  ls(ni,  x',  x)  A  x  =  nil  A  n2  =  0 

Q2  =  3x',  z.  ls(n  1,  x',  x)  *  (1  4  [next  \  z\)  *  ls{n2  —  1,  z,  nil)  A  n2  >  0 
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This  was  obtained  by  replacing  ls(n2,x,  nil)  with  its  definition  and  distributing  A  and  * 
over  disjunction.  We  can  then  use  the  Inst-Disj  rule  to  insert  a  non-deterministic  branch 

branch  true  =>■  ki,  true  =>■  k2  end 

where  k\  and  k2  are  chosen  such  that  T  h  {Qi}  k\  ►TM.n2  (I)  and  T  h  {Q2}  k2  ►ni;n2  CD- 
Our  next  step  is  to  copy  over  the  branch  from  the  original  program,  obtaining  the  following 
partial  instrumented  program.  In  each  branch  case,  we  have  indicated  what  the  precondi¬ 
tion  at  that  location  will  be  during  the  proof  that  this  program  is  a  valid  instrumentation. 

L\  :  {Q  i  V  Q  2}  branch  true  =>■  {Qi}  branch  x  7^  nil  =>•  {Q\  As  /  nil}  . . . , 

x  —  nil  =>■  {Qi  A  x  =  nil}  . . .  end, 
true  =>■  { Q2 }  branch  x  ^  nil  =>■  { Q2  Ax  /  nil}  . . . , 

x  =  nil  =>■  {Q2  A  x  =  nil}  . . .  end  end 

Thus,  we  get  four  cases,  one  for  each  combination  of  conditions  from  the  two  branches. 
Since  the  formulas  Q\  A  x  ^  nil  and  Q 2  A  x  =  nil  are  both  equivalent  to  false,  we  can 
prune  those  branches  with  the  False  rule,  obtaining  the  following. 

L\  '■  {Q 1  V  Q 2}  branch  true  =>■  {Qi}  branch  x  ^  nil  =>■  {false}  halt, 

x  =  nil  {Qi  Ax  =  nil}  .  .  .  end, 

true  =>■  {Q2}  branch  x  ^  nil  {Q2  Ax  /  nil}  . . . , 

x  =  nil  =>■  {false}  halt  end  end 

We  can  then  use  Inst-Assume  to  record  facts  about  n2,  obtaining 

L\  :  {Q 1  V  Q2}  branch  true  =>■  {Qi}  branch  x  ^  nil  =A{false}  halt, 

x  =  nil  =^{Qi  A  x  =  nil} 

assume(n2  =  0);  ...  end, 
true  =>■  {Q2}  branch  x  ^  nil  ^>{Q2  Ax  /  nil} 

assume(n2  >0);  . . . , 
x  =  nil  =A{false}  halt  end  end 
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In  this  case,  the  use  of  Inst-Disj  just  described  yields  an  instrumented  program  which 
is  equivalent  to  the  program  we  previously  obtained  from  the  simpler  and  more  succinct 
method  of  inserting  assume()  statements  with  Inst-Assume.  This  will  be  the  case  when¬ 
ever  there  are  expressions  over  instrumented  variables  that  are  equivalent  to  each  of  the 
original  branch  conditions  (as  is  the  case  with  the  expressions  n2  =  0  and  n2  >  0  and  the 
branch  conditions  x  —  nil  and  x  ^  nil). 

However,  there  are  cases  where  Inst-Disj  is  necessary  and  the  simpler  method  does 
not  yield  satisfactory  results.  This  happens  when  the  instrumented  variables  only  allow  us 
to  express  an  under-  or  over-approximation  of  the  original  branch  condition.  For  example, 
consider  the  condition  x  =  y  in  a  state  satisfying  ls(n,  x,  y).  If  n  —  0  in  this  state,  then 
x  =  y.  But  if  n  >  0  then  x  and  y  can  still  be  equal  if  the  list  is  cyclic.  As  such,  n  —  0 
is  an  under- approximation  of  the  condition  x  =  y,  but  we  have  no  corresponding  under¬ 
approximation  for  x  y £  y.  An  instrumentation  of  a  branch  on  x  =  y  might  then  look  like 
the  following  (we  have  added  the  assume()  statements  on  n  in  a  different  location,  but 
the  procedure  is  otherwise  the  same  as  in  the  previous  example).  As  before,  we  mark  the 
inconsistent  branch  with  the  precondition  {false}. 

L\  :  {ls(n,  x,  y)}  branch  true  assume(n  =  0) ;  branch  x  =  y  =>■ . . . , 

x  7^  y  =A{false}  halt  end, 
true  assume(n  >  0) ;  branch  x  —  y  =>  . . . 

x  7^  y  =>• . . .  end  end 

In  all  of  these  examples,  we  used  Inst-Disj  to  split  on  a  disjunction  that  arose  naturally 
from  the  disjunctive  form  of  the  definition  of  Is.  We  can  also  use  Inst-Disj  to  case  split  on 
any  predicate.  Since  the  standard  (non-separating)  logical  connectives  in  separation  logic 
are  classical  in  nature,  we  have  the  law  of  excluded  middle  and  thus  can  always  introduce 
the  disjunction  Q  V  ->Q  for  any  formula  0.  This  then  allows  us  to  case  split  on  an  arbitrary 
Q  at  any  point  in  the  instrumented  program.  For  example,  we  can  branch  on  whether  two 
variables  are  equal  even  if  such  an  expression  does  not  appear  in  the  precondition  or  in  the 
program  text. 
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4.1.2  Properties 

We  note  here  a  few  useful  properties  of  the  proof  system  given  in  Figure  4.1.  Of  course 
soundness  is  the  property  in  which  we  are  most  interested.  However  as  its  proof  is  the 
most  complex,  we  save  it  for  Section  4.3. 


Choice  of  Instrumentation  Variables 

The  proof  system  in  Figure  4.1  asks  us  to  choose  a  set  V  of  instrumentation  variables 
which  must  contain  all  the  variables  that  appear  free  in  the  instrumentation  commands. 
Intuitively,  this  set  need  only  mention  the  instrumentation  variables  that  are  actually  used 
by  the  instrumented  program.  This  is  captured  by  the  following  theorem. 

Theorem  19.  IfT  P  P  ►y  P  then  Y  F  P  ►y,  P  for  V'  =  ( fv{P )  -  fv{P)). 

Proof.  We  will  show  that  any  derivation  of  T  F  P  ►  y  P  can  be  transformed  into  a 
derivation  of  T  P  P  ►y/  P.  The  Inst-Prog  rule  ensures  that  fv(P )  (T  V  =  0  and 
we  proceed  to  transform  the  derivation  of  each  Y  F  (r(Z) }  P{1)  ►  y  P{1)  premise  in 
Inst-Prog.  The  set  V  only  participates  in  side  conditions  of  rules  and  is  unchanged  as  we 
move  up  the  proof  tree.  We  want  to  show  that  for  each  rule,  replacing  V  by  V'  in  the  side 
condition  still  results  in  a  valid  derivation. 

To  take  a  representative  case,  consider  the  Inst-Exists  rule.  We  have  x  E  V.  We  must 
show  that  x  E  V' .  Clearly  x  E  fv(P )  as  (x  :=  ?;  k)  is  a  sub-term  of  P.  Then  x  E  V' 
provided  that  x  f  fv(P).  But  we  have  that  fv(P)  D  V  —  0,  thus  x  E  V  implies  x  f  fv(P). 
The  other  cases  are  similar.  □ 

We  also  have  that  if  V  is  sufficient  to  show  instrumentation,  then  any  extension  of  V 
is  also  sufficient. 

Theorem  20.  IfT  P  P  ►y  P  then  for  all  V1  D  V  such  that  V'  (T  fv(P)  =  0  we  have 
Y  P  P  ►y,  P. 
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Proof.  The  proof  is  by  induction  on  the  derivation  of  T  b  P  ►  y  P.  For  the  rule  Inst-Prog 
we  need  to  show  that  fv(P)  fl  V'  —  0  and  V/  G  dom(P).  (T  b  {Q}  P(l)  ►  y'  P( /)). 
The  first  is  given  as  an  assumption,  the  second  is  proved  by  induction  on  the  derivation. 
Specifically,  we  show  that  for  all  k  and  V'  D  V,  if  T  b  {()}  k  ►y  k  holds,  then  so  does 

r  b  {Q}  jfc  ►y,  k. 

Examining  the  rules  in  Figure  4.1  we  see  that  only  Inst- Assign  and  Inst-Exists  in¬ 
volve  conditions  on  the  set  of  variables  V' .  For  the  other  rules,  our  goal  will  follow  imme¬ 
diately  from  the  inductive  hypothesis.  Suppose  that  Inst-Assign  was  the  last  rule  applied 
in  the  derivation  of  T  b  {Q}  k  ►y  k.  Then  we  have  {Q}  xT  :=  eT  {Q'},  T  b  {Q'}  k  ►y  k 
and  xT  G  V.  From  the  last  condition  and  V  D  V  we  have  xT  G  V .  The  inductive  hypoth¬ 
esis  gives  us  T  b  {Q}  k  ►y/  k.  These  last  two  together  with  {Q}  xT  :=  eT  { Q '}  are  then 
sufficient  to  apply  Inst-Assign  with  V'  as  the  set  of  instrumentation  variables,  obtaining 
r  b  {Q}  (xT  :=  eT ;  k)  ►y'  k,  which  is  our  goal. 

The  case  for  Inst-Exists  is  similar,  as  again  the  only  condition  on  V  is  the  side  con¬ 
dition  that  xT  G  V.  □ 

Combined,  these  theorems  indicate  that  the  use  of  V  in  the  inference  system  is  merely 
a  notational  convenience.  It  could  be  derived,  up  to  extension,  from  the  free  variables  of 
P  and  P' . 

Weakening  T 

For  an  instrumentation  of  a  given  continuation,  T  can  always  be  weakened  (this  is  not  the 
case  at  the  level  of  programs,  however). 

Lemma  12.  IfT  b  {Q}  k  ►y  k  and\H.  T(/)  r'(/)  then 

r'b  {Q}k>v  k 

Proof.  We  show  how  to  transform  a  derivation  ofrbjQj/c^y/c  into  a  derivation  of 
T'  b  {()}  k  ►y  k.  For  all  the  rules  in  the  derivation  except  Goto,  we  can  simply  replace 
T  by  T7.  The  rule  will  still  be  valid.  For  Goto,  which  is  the  only  rule  in  Figure  4.1  that 
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involves  a  condition  on  T,  we  make  the  following  change.  The  Goto  rule  is  reproduced 
below. 


m  =  q 

r  I—  {Q}  gOtO  l  ►  y  gOtO  l 


Goto 


As  the  equality  in  T(/)  =  0  is  syntactic  equality,  any  instance  of  Goto  has  the  form  below. 


- Goto 

T  b  {T(l)}  goto  /  ►y  goto  l 

These  rule  instances  are  each  replaced  with  the  following  derivation,  which  uses  our  as¬ 
sumption  r  (/)  r'(i). 


- Goto 

rm  =»  r'm  r  b  {rm  goto  /  ►y  goto  / 

-  Strengthening 

T'  b  (T(Z)}  goto  l  ►y  goto  l 


□ 


Over-approximation  of  Reachable  States 

The  manner  in  which  the  preconditions  in  Figure  4.1  are  transformed  is  reminiscent 
of  Hoare-logic  reasoning.  And  in  fact,  it  is  the  case  that  these  formulae  always  over¬ 
approximate  the  reachable  states  at  the  corresponding  point  in  the  execution  of  the  in¬ 
strumented  program,  just  as  Hoare-style  pre-  and  post-conditions  do.  We  show  this  now, 
beginning  with  the  following  lemma. 

Lemma  13.  Suppose  that  T  b  {Q}  k  ►y  k  holds  and  ( s ,  h)  f=  Q.  Then  for  all  s' ,  h! ,  l'  we 

have  ( k ,  (s,  h))  — goto(/',  (s' ,  h'))  implies  (s',  h')  \=  T(/'). 

P 

The  proof  is  by  induction  on  the  derivation  ofT  b  {()}  k  ►y  k  and  in  each  inductive 
case  involves  checking  that  if  the  instrumented  command  in  the  conclusion  of  a  rule  takes 
a  single  step  from  a  state  satisfying  the  precondition,  then  the  precondition  in  the  premise 
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holds  of  the  post-state.  We  do  not  give  a  full  proof  here  since  the  proof  of  soundness  also 
involves  checking  this  property  of  the  rules.  For  details,  see  Section  4.3. 

We  can  now  show  that  the  preconditions  over-approximate  the  reachable  states. 

Theorem  21.  IfT\~  P  ►  y  P  and  ( s ,  h )  |=  T(initloc(P))  and 

goto (initloc(P) ,  (s,  h ))  — )-+  goto(/',  (s',  h ')) 

P 

then  (s' ,  h ')  |=  r(/'). 

Let/0  =  initloc(P).  IfT  h  P  ►y  P  holds,  then  we  have  T  h  (r(/0)}  P(/0)  ►y  P(/o)- 
This  together  with  our  assumption  (s,  h)  |=  r(/0)  allows  us  to  apply  Lemma  13,  thus 
obtaining  that  goto (initloc(P),  (s,  h ))  — >-+  goto(/',  (s',  h '))  implies  (s',  h ')  \=  T(/'),  as 

p 

desired. 

Inversion 

Since  there  is  only  one  rule  for  proving  V  h  P  ►y  P,  we  have  the  following  inversion 
lemma. 

Lemma  14.  IfThP  ►y  P  then  all  the  following  hold 

1.  dom(P)  =  dom(P ) 

2.  fv{P)  nv  =  0 

3.  initloc(P)  =  initloc(P) 

4.  VI  G  dom(P).  (r  h  {r(/)}  P(l)  ►y  P(l)) 

We  also  have  that  all  judgments  appearing  in  the  proof  involve  sub-terms  of  the  pro¬ 
gram  P  in  the  position  following  the  ►  symbol. 

Lemma  15.  If  D  is  a  sub-derivation  of  T  h  P  ►y  P  with  conclusion  T  h  {()}  k  ►y  k 
then  k  is  a  sub-term  of  P. 
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Proof.  The  proof  is  by  induction  on  the  derivation  of  T  b  P  ►y  P.  We  check  each  rule 
in  the  system  given  in  Figures  4. 1  and  4.2  and  verify  that  if  the  conclusion  has  the  form 
T  b  {Q}  k  ►y  k  and  a  premise  has  the  form  T  b  { Q '}  k'  ►y  k!  then  k'  is  a  sub-term  of 

k.  □ 

Corollary  3.  If  D  is  a  sub-derivation  ofV  b  P  ►y  P  with  conclusion  T  b  {()}  k  ►y  k 
then  V  C\fv(k)  =  0. 

Proof  Since  T  b  P  ►y  P  holds,  we  have  V  D  fv(P )  =  0  from  Lemma  14.  By  Lemma 
15  we  have  that  k  is  a  sub-term  of  P.  Thus,  fv(k)  C  fv(P).  Combining  these  facts  gives 
us  that  V  D  fv(k)  =0.  □ 

4.1.3  Derived  Rules 

We  now  discuss  certain  rules  which  are  derived  in  the  sense  that,  given  their  premises,  their 
conclusion  can  be  constructed  by  the  use  of  existing  rules.  Such  rules  capture  common  rea¬ 
soning  patterns  and  thus  we  will  often  use  them  directly  in  proofs.  Often  the  instrumented 
program  in  the  conclusion  of  the  rule  is  equivalent  to  another,  simpler,  instrumented  pro¬ 
gram  in  the  sense  that  they  produce  sets  of  execution  traces  that  are  stuttering  equivalent. 
In  such  cases  we  will  note  this  and  adopt  the  rule  with  the  simplified  conclusion.  Note 
that  this  simplification  step  is  not  usually  part  of  the  process  of  generating  derived  rules. 
Thus,  these  are  more  accurately  described  as  “simplifications  of  derived  rules,”  however 
we  adopt  the  term  “derived  rule”  for  conciseness. 

Case  Split  with  Conditions  In  the  previous  section,  we  repeatedly  encountered  contin¬ 
uations  with  the  following  structure. 

k  =  branch  true  =»  assume(ei) ;  ki, 

true  assume^) ;  end 


148 


4.1  Theory 


Such  a  pattern  corresponds  to  the  derivation  given  in  Figure  4.5.  The  code  above  is  equiv¬ 
alent  to  the  following. 

k'  =  branch  e\  =>•  k\, 

e2  =>■  k2  end 

To  see  why,  consider  the  traces  of  k.  These  have  one  of  two  forms.  Either  they  fit  the 
pattern 

(k,(s,h)}(( assume(ei);  ki),(s,h))  Tx 

where  (s,  h)  \=  e\  and  7j  is  a  trace  of  k\  starting  from  s,  h,  or  they  are  of  the  form 

(k,(s,h))(( assume(e2);  k2),(s,h))  T2 

where  (s,  h)  \=  e2  and  T2  is  a  trace  of  k2  starting  from  s,  h. 

The  traces  of  k'  are  stuttering  equivalent  to  these  with  respect  to  the  equivalence  rela¬ 
tion  =,  which  is  the  equivalence  relation  on  states  that  allows  the  current  continuation  to 
differ  but  otherwise  requires  the  states  to  match  (a  full  definition  is  given  on  page  89).  The 
traces  of  k'  have  the  form 

{k\{sM)Ti 

and 

(■ k',(s,h))T2 

These  differ  from  the  trace  of  k  only  in  that  the  traces  of  k  contain  one  more  repetition  of 
the  memory  state  s,  h. 

Collecting  the  premises  in  the  derivation  in  Figure  4.5  and  using  the  simplified  contin¬ 
uation  k'  as  the  conclusion  gives  us  the  following  derived  rule. 

Inst-Branch 

Q  =>■  e1  V  e2  T  h  {Q  A  ei}  kx  ►y  k  T  F  {Q  A  e2}  k2  ►y  k 
T  h  {Q}  branch  e\  ki,  e2  =>■  k2  end  ►y  k 

This  lets  us  directly  branch  on  pure  conditions  present  in  a  disjunctive  precondition. 
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Q  A  e\ 


ei 


r  F  {Q  A  ei}  ki  ►  y  fe  <3  A  e2  =>■  e2 

I-A 


r  F  {Q  A  e2 }  /c'2  ►v  A: 


T  F  {Q  A  ei}  (assume(ei) ;  fci)  ►y  k 


T  F  {<3  A  e2j  (assume(e2);  k2)  ►y  k 


I-A 


r  F  {(Q  A  ei)  V  (Q  A  e2)}  branch  true  =>•  assume(ei);  At,  true  =>  assume(e2);  k2  end  ►y  k 


Q  =>-  ei  V  e2 


Q  ^  (Q  A  ei)  V  (Q  A  e2) 


T  F  {Q}  branch  true  =>-  assume(ei);  At,  true  =>•  assume(e2);  k2  end  ►y 


Str 


Figure  4.5:  Derivation  corresponding  to  the  insertion  of  a  case  split  on  ei  V  e2.  The  premises 
that  become  premises  of  the  derived  rule  arc  boxed  (the  other  two  premises  arc  tautologies).  We 
abbreviate  STRENGTHENING  as  Str  and  Inst-Assume  as  I-A.  The  unlabeled  rule  is  an  instance 
of  Inst-Disj. 

Branch  Translation  We  can  build  on  the  Inst-Branch  rule  given  previously  to  derive 
a  rule  that  lets  us  translate  branch  conditions  in  one  step  when  the  conditions  have  an 
exact  analogue  in  terms  of  instrumentation  variables.  To  take  an  example,  in  the  case  of 
complete  lists  of  the  form  ls(n,  x,  nil) — that  is,  lists  of  length  n  starting  at  x  and  ending  at 
nil — we  have  that  ls(n,  x,  nil)  A  n  —  0  ls(n,  x,  nil)  A  x  =  nil.  Thus,  in  a  state  in  which 
we  have  ls(n,  x,  nil),  knowing  that  n  —  0  tells  us  just  as  much  as  knowing  that  x  =  nil. 

The  derivation  given  in  Figure  4.6  forms  the  basis  of  the  derived  rule.  We  then,  as  in  the 
previous  case,  simplify  the  conclusion.  However,  the  argument  that  such  a  simplification 
is  permitted  is  more  complicated  in  this  case.  We  would  like  to  take  the  following 

k  =  branch  e1  =>  assume(e'1) ;  hi, . . en  =>-  assume(e'n) ;  kn  end 
and  reduce  it  to  the  continuation  below. 

k!  =  branch  =>•  ki, . . . ,  e'n  =>-  kn  end 

The  problem  is  that  these  two  continuations  are  only  equivalent  for  initial  states  (s,h)  in 
which  (s,  h)  |=  e'  implies  (s,  h )  |=  e*. 
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If  this  implication  holds,  then  the  traces  of  k  have  the  following  form 

(k,  (s,  h))  ((assume(e');  fc*),  (s,  h))  Tt 

where  (s,  h)  f=  e%  and  (s,  h)  \—  e'  and  T,  is  a  trace  of  k,.  The  traces  of  k!  have  the  form 

(k' ,  (s,  h))  Ti 


where  (s,  h)  |=  e'.  If  (s,  h )  |=  e'  implies  (s,  h )  |=  e*,  then  these  two  sets  of  traces  are 
related  by  (for  each  trace  of  k  there  is  a  matching  trace  of  k!  and  vice-versa). 

To  ensure  that  the  above  simplification  is  always  valid  then,  we  require  that  Q  Ae'  =>■  et. 
This,  combined  with  the  fact  that  Q  is  an  over- approximation  of  the  reachable  states  at  this 
point  in  the  execution,  ensures  that  the  continuation  will  only  be  executing  in  contexts  in 
which  for  all  s ,  h  we  have  (s,  h)  \=  e-  implies  (s,  h)  \—  et  and  the  replacement  is  valid. 
This  leaves  us  with  the  rule  below.  Note  that  since  the  derivation  in  Figure  4.6  requires 
that  (Q  A  er)  e\  and  the  rule  for  simplifying  the  conclusion  requires  that  (0  A  e')  e,t, 
this  forces  the  assumption  that  (Q  A  et)  <=>  (Q  A  e')  in  the  final  rule. 


Inst-B  ranchTrans 

(Q  A  et)  <=>  (Q  A  e[) 


Vi  (T  h  {Q  A  et}  ki  ►y  fc?;) 


(branch  e[  =>  k\, . . 

e'n  =>■  K  end 


'V 


^  branch 


Assignment  We  took  as  primitive  the  Inst-Assign  rule.  Having  a  succinct  rule  for  up¬ 
dating  instrumentation  variables  is  useful,  as  this  operation  occurs  quite  frequently.  How¬ 
ever,  as  we  will  see  in  this  section,  this  rule  is  actually  derivable  from  the  others.  Figure  4.7 
gives  the  derivation  for  the  simpler  case  where  we  are  inserting  the  instrumentation  com¬ 
mand  x  :=  e  and  x  qL  /7(e).  We  can  then  derive  the  more  general  rule  with  the  commonly- 
used  trick  of  inserting  a  temporary  variable  (transforming  x  :=  e  into  y  :=  e;  x  :=  y 
where  y  is  a  fresh  variable). 

Essentially,  the  derivation  relies  on  the  fact  that  we  can  use  the  Strengthening 
rule  to  reason  forward  from  our  precondition  Q,  obtaining  the  sequence  of  implications 
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(Q  A  €-i 


r  h  {Q  A  eij  ki  ►y  ki 


Vi 


T  h  {Q  A  e^}  (assume(e();  ki)  ►y  ki 


Inst-Assume 


(branch  ei  =>  assume(e1);  k\, . . .  A  /  branch  e\  =>  fei, . . . , 

►  y 

en  =>•  assume(e^);  kn  end  y  y  en  =>  kn  end 


Branch 


Figure  4.6:  Derivation  corresponding  to  the  translation  of  branch  conditions  into  conditions  on 
instrumentation  variables.  In  the  rule  labeled  Vi,  the  premise  holds  for  each  value  of  i.  The 
premises  that  become  premises  of  the  derived  rule  are  boxed.  We  require  that  they  hold  for  each 
i  €  {1, ... ,  n}. 

Q  =>■  3x.  Q  =>■  3x' .  Q\x'/x\.  This  allows  us  to  perform  the  quantification  of  the  previous 
value  of  x  that  occurs  in  the  forward  reasoning  rule  for  a;  :=  e  in  Hoare  logic.  We  then 
note  that,  since  our  semantics  of  expressions  is  total,  if  e  does  not  contain  x  then  3  a;.  x  —  e 
is  a  tautology,  allowing  us  to  conclude 

(3aV.  Q[x' /x\)  A  (=te.  x  =  e) 

Since  x  is  not  free  in  3x' .  Q\xf  /x\,  we  can  extend  the  scope  of  the  quantifier  on  x,  obtaining 

3x.  (3a;'.  Q[x' /x\)  Ax  —  e 

We  can  the  use  the  Inst-Exists  rule  to  add  the  command  x  ?  and  obtain  the  precondi¬ 
tion 

(3a;'.  Q[x'/x])  Ax  =  e 

which  allows  us  to  insert  assume(a;  =  e)  with  the  Inst-Assume  rule. 

The  derivation  in  Figure  4.7  also  makes  use  of  the  fact  that  { Q }  x  :=  e  {()'}  implies 
3a;'.  [Q[x' /x]  A  (x  =  e[x' /x]))  Q' .  This  holds  because  3a;'.  (Q[x'/x\  A  (x  —  e[x'/x])) 

is  the  strongest  post-condition  of  x  :=  e  with  respect  to  the  precondition  Q.  If  x  ^  fv(e) 
then  e[x'/x\  =  e  and  the  strongest  post-condition  is  simply  3a;'.  Q[x'/x]  Ax  =  e. 

Collecting  the  premises  and  side-conditions  from  the  derivation  in  Figure  4.7  we 
obtain  the  following  derived  rule  for  assignments  (note  that  we  have  also  simplified 
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x  £  fv(Q[x'/x\ ,  e)  to  x  £  fv(e)  since  Q[x'/x]  cannot  contain  x). 


{Q}x:=e{Q'}  Th{Q'}k+vk 
T  h  {Q}  (x  ?;  assume(x  =  e);  fc)  ►y  k 


x  e  V,  x  qL  fv(e ) 


We  can  then  prove  that  if  x  ^  fv(e)  then  (x  :=  ?;  assume(x  =  e);  fc)  is  stuttering 
equivalent  to  (x  :=  e;  k ).  Let  k  be  the  first  continuation  and  k!  be  the  second.  The  traces 
of  k  have  the  form 

(k,  (s,  h ))  ((assume(x  =  e) ;  k),  (s' ,  /i))  T 

where  s'  —  s[x  — >•  v]  for  some  v  and  ( s' ,  h)  \=  (x  =  e)  and  T  is  a  trace  of  k  starting  from 
The  traces  of  k!  have  the  form 

(kr,  (s,  h))  (k,  (s[x  ->•  [e]  s],h))  T 


The  traces  are  stuttering  equivalent  (with  respect  to  =)  provided  we  can  show  that 

s'  =  s[x  — >  [e]s].  The  fact  that  (s',h)  \—  (x  =  e)  implies  s’(x)  =  [e]  s.  Combined 
with  the  fact  that  s'  =  s[x  — *  v\,  this  tells  us  that  v  —  [[e]  s  and  thus  s'  =  s[x  — >  [e]  s]  as 
desired. 

The  above  argument  allows  us  to  simplify  the  instrumented  continuation  in  the  conclu¬ 
sion,  obtaining  the  following  rule. 


Inst-Assign-NotFree 
{Q}x:=e{Q'}  Th{Q'}k+vk 

T  h  {Q}  (x  e;  k )  ►y  k 


x  G 


V,x  &jv(e) 


This  then  gives  us  all  the  machinery  necessary  to  replicate  the  Inst-Assign  rule.  Sup¬ 
pose  we  had  the  proof  system  in  Figure  4.1,  but  without  the  Inst-Assign  rule  and  we 
wanted  to  insert  the  assignment  x  e,  where  x  is  an  instrumentation  variable.  Then 
we  could  select  an  instrumentation  variable  y  which  is  not  otherwise  used  (by  Theo¬ 
rem  20  this  can  always  be  done)  and  insert  the  commands  y  :=  e;  x  :=  y  using  the 
Inst-Assign-NotFree  rule. 
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{Q}  x  :=  e  {Q'}  x  fv(e) 


((3x7.  Q[x' /x\)  A  x  =  e)  Q'  T  h  {Q'}  k  ►y  k 


((3x7.  Q[x'  /x\)  Ai  =  e)=>i  =  e 


T  F  {(Elx7.  Q[x' /x\)  A  x  =  e}  k  ►y  k 


T  F  {(Elx7.  Q[x'/x])  Ai  =  c)  (assume(x  =  e)  ;  k)  ►y  k 
T  F  {3x.  (3x'.  Q[x'/x])  A  x  =  e}  (x  :=  ?;  assume(x  =  e)  ;  /c)  ►y  A’ 


I-A 


x  <EV 


I-E 


x'^fv(Q) 


x  0  fv(e ) 


Q  3x.  (3x7.  Q[x7/x])  A  x  =  e  I 

r  h  {Q}  (x  :=  ?;  assume(x  =  e)  ;  A:)  ►y  k 

Figure  4.7:  Derivation  of  the  Inst-Assign  rule  for  the  case  where  x  0  fv(e).  The  formulas 
and  conditions  that  become  premises  and  side  conditions  in  the  derived  rule  are  boxed.  The  un¬ 
boxed  formulas  can  always  be  made  to  hold,  either  because  they  are  tautologies  or,  in  the  case  of 
x'  0  fv(Q)  because  we  get  to  choose  x’  when  constructing  the  derivation.  I-A  stands  for  INST- 
ASSUME,  I-E  stands  for  Inst-Exists.  All  other  rules  are  instances  of  STRENGTHENING. 

4.2  Example 

Before  examining  in  more  detail  the  theory  behind  instrumented  programs,  we  first  con¬ 
sider  a  concrete  example.  Consider  the  C  program  in  Figure  4.8.  This  program  advances  a 
pointer  r  through  an  ordered  binary  tree,  searching  for  the  value  v.  It  returns  1  if  the  value 
is  found  and  0  otherwise.  Suppose  we  want  to  verify  that  this  program  terminates. 

The  usual  method  for  showing  this  is  to  produce  a  ranking  function,  which  is  a  function 
from  program  states  to  some  well-founded  set  (often  a  bounded  subset  of  the  integers).  For 
programs  not  involving  the  heap,  these  ranking  functions  can  be  given  as  functions  of  the 
program  variables.  However,  for  programs  that  manipulate  heap-based  data  structures, 
these  functions  may  involve  properties  of  the  heap. 
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int  mem (TreePointer  r,  int  v)  { 
int  u; 

while (r  ! =  0 )  { 

u  =  r->data; 
if  (u  ==  v) 
return  1; 
else  if  (u  <  v) 
r  =  r->right; 
else 

r  =  r->left; 

} 

return  0; 


Figure  4.8:  C  code  implementing  a  membership  query  for  an  ordered  binary  tree. 


This  is  the  case  for  our  example.  We  cannot  write  a  ranking  function  for  the  loop 
that  is  given  solely  in  terms  of  program  variables.  The  quantity  that  is  decreasing  at  each 
iteration  is  the  size  of  the  sub-tree  at  r,  which  does  not  have  an  explicit  representation  in 
the  program.  As  such,  standard  termination  tools  cannot  be  applied  to  this  example  and  we 
might  think  that  any  method  for  constructing  a  ranking  function  for  this  example  would 
have  to  be  heap-aware. 

What  we  show  in  this  section  (and  in  the  thesis  in  general)  is  that  by  constructing  an  ap¬ 
propriate  instrumented  version  of  the  code,  we  can  provide  explicit  information  regarding 
the  counts  involved  in  the  termination  argument.  This  provides  a  standard  termination  tool 
with  the  components  it  needs  to  construct  a  ranking  function  and  allows  the  rank  function 
synthesis  to  be  done  with  no  knowledge  of  the  underlying  heap-based  data  structures. 

We  begin  by  translating  the  C  program  into  our  program  format.  The  result  of  this 
translation  is  given  in  Figure  4.9.  We  include  a  variable  “ return ”  that  models  the  return 
value  of  the  function. 
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loop  :  (T)  branch  r  =  nil  =>  (5)  return  :=  0;  halt, 
r  /  nil  =>•  (3)  u  :=  r.data; 

(4)  branch  u  =  v  =>  (5)  return  :=  1;  halt, 

u  <  v  =>-  (6)  r  :=  /’.right;  goto  loop, 
u  >  v  =>■  (7)  r  :=  ?  . left ;  goto  loop 


Figure  4.9:  The  program  from  Figure  4.8  translated  into  our  program  notation,  with  control  points 
numbered. 

To  produce  the  instrumented  version,  we  need  a  means  of  describing  the  contents  of 
the  heap.  This  is  provided  by  the  following  definition  of  binary  trees.  Here,  n  represents 
the  number  of  nodes  in  the  tree. 

tree(n,  r )  = 

(n  =  0  A  r  =  nil  A  emp) 

V  (n  >  0  A  3n\,n2-  (n  =  n±  +  712  +  1)  A 

(3/c,  rc,  m.  (r  i->  [left  :  Ic,  right  :  re,  data  :  m\)  * 
tree(ni,  Ic )  *  tree(ri2,rc))) 

An  instrumented  version  of  the  search  program  is  given  in  Figure  4.10.  The 
loop  invariant  is  tree(n,r)  *  true,  which  indicates  that  there  is  a  binary  search  tree 
at  r  consisting  of  n  separate  nodes  (where  a  “node”  is  a  pointer  cell  of  the  form 
x  >->■  [left  :  a,  right  :  b,  data  :  c]).  The  “*  true”  portion  indicates  that  the  heap  may  also 
contain  other  cells.  For  a  more  complete  analysis  of  this  program,  we  would  want  to  define 
a  predicate  describing  a  “tree  with  a  hole”  (similar  to  the  approach  taken  in  Calcagno  et  al. 
[2005])  in  order  to  track  these  other  cells  more  precisely,  as  this  information  is  needed  to 
conclude  that  the  heap  still  contains  a  tree  when  the  function  returns. 

We  have  annotated  the  instrumented  program  with  invariants  at  key  locations,  show¬ 
ing  the  value  of  Q  that  would  be  used  in  the  proof  of  T  F  {Q}  k  ►y  k  at  that  point. 
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loop  :  {tree(n,  r )  *  true} 

Q  branch 

n  =  0  =$■  e  return  :=  0;  halt 
n  >  0  =>-  Q 

{3m,n2.  £?}  ni  :=  ?;  n2  :=  ?; 

{Q}  assume(n  =  m  +  n2  +  1)  ; 

{Q}  u  :=  r.data; 

Q  branch 

return  :=  1;  halt, 
u  <  v  =>  ©  r  :=  r.left; 

{tree(ni,r)  *  true}  n  :=  n\  ; 

{free(n,  r)  *  true}  goto  loop 
u  >  v  =>-  |g|  r  :=  might; 

(tree(n2,r)  *  true}  n  :=  ro2; 

{iree(n,  r)  *  true}  goto  loop 

end 

end 

Q  =  3lc,  re ,  m.  (r  m  [left  :  Ic,  right  :  rc,  data  :  m]  * 

tree{n\ ,  le)  *  tree(n2,  rc)  *  true)  A  (n  =  ni  +  n2  +  1) 

Figure  4.10:  Instrumented  version  of  the  program  in  Figure  4.9. 


157 


4  Instrumented  Programs 


The  main  branch  on  r  =  nil  is  transformed  into  an  equivalent  branch  on  n  =  0  by  the 
Inst-BranchTrans  derived  rule  from  Section  4.1.3.  Other  commands  are  added  via  the 
Inst-Assume,  Inst-Exists,  and  Inst-Assign  rules. 

The  program  first  branches  on  the  instrumentation  variable  n,  which  represents  the 
number  of  nodes  in  the  tree  rooted  at  r.  In  the  case  where  the  tree  is  empty,  we  return. 
In  the  case  where  the  tree  is  non-empty,  it  is  expanded  into  its  left  and  right  child,  whose 
sizes  summed  plus  one  equals  n.  When  we  reach  the  end  of  this  case,  having  advanced  r 
to  the  appropriate  child,  the  instrumentation  command  n  :=  rii  is  inserted  (where  i  —  1  or 
i  —  2  depending  on  the  child  that  was  chosen).  This  updates  n  to  contain  the  number  of 
nodes  in  the  sub-tree  that  is  now  pointed  to  by  r. 

To  show  termination,  we  can  focus  on  the  changes  to  n.  We  see  that  in  all  paths  through 
the  loop,  either  we  halt  or  n  strictly  decreases.  As  n  is  bounded  below  by  0,  this  ensures 
termination  of  the  loop. 

Note  that  the  commands  n i  :=  ?,  n2  ?,  and  assume(n  =  n\  +  n2  + 1)  have  the  effect 
of  ensuring  that,  regardless  of  whether  the  left  child  (with  size  n i)  or  the  right  child  (size 
n2)  is  chosen,  the  size  of  the  tree  at  r  decreases.  The  non-deterministic  choice  commands 
assign  new,  arbitrary  values  to  ri\  and  n2  and  then  the  assume  statement  ensures  that  only 
values  that  satisfy  the  relationship  between  the  sizes  are  considered  (the  assume  allows  us 
to  disregard  executions  where  non- satisfactory  values  of  n\  and  n2  are  chosen). 

If  the  assume  statement  were  not  present,  the  program  in  Figure  4.10  would  still  be  a 
valid  instrumentation  according  to  the  rules  in  Figure  4.1.  However,  it  would  have  execu¬ 
tions  that  we  know  are  not  possible  (namely,  executions  where  ri\  and  n2  do  not  satisfy 
n  =  ri\  +  n2  +  1).  These  extra  paths  must  be  considered  by  subsequent  analyses  and, 
in  this  case,  the  absence  of  the  constraint  n  =  ri\  +  n2  +  1  would  prevent  a  termination 
analysis  from  showing  that  the  instrumented  program  terminates. 

4.2.1  Alternate  Size  Measures 

We  just  presented  a  treatment  of  trees  where  the  notion  of  size  corresponded  to  the  number 
of  nodes  in  the  tree.  Trees  also  admit  other  notions  of  size — tree  height,  for  example — 


158 


4.2  Example 


and  this  is  true  of  most  data  structures.  Even  singly-linked  lists  of  integers  admit  multiple 
notions  of  size.  One  may  be  interested  in  tracking  the  length  of  the  list,  the  maximal  value 
contained  in  the  list,  or  the  sum  of  all  values  contained  in  the  list,  to  name  just  a  few.  The 
rules  presented  in  Figure  4.1  permit  reasoning  about  any  of  these  notions  of  size.  Any 
quantity  whose  update  relation  can  be  represented  using  the  expression  language  can  be 
tracked  by  inserting  instrumentation  commands  in  the  manner  discussed  previously. 

As  an  example,  if  we  want  to  track  the  height  of  a  tree,  we  could  use  the  definition 
below. 

treeh(h,  r)  =  (/?.  =  0  A  r  =  nil) 

V  ( (h  0  A  3hi,  h2, m.  (h i  A  h)  A  (/12  A  h)  A  (h  —  h\  4-  1  V  h  —  /12  T  1) 

3 Ic,  rc.  r  (->•  [left  :  Ic,  right  :  rc,  data  :  m\ 

*  treeh(h\,  Ic)  *  treeh(li2,  rc )) 

Here  we  use  the  constraint  (hi  A  h)  A  (h2  A  h)  A  (, h  —  hi  +  1  V  h  =  h2  +  1)  to  ensure 
that  if  h  1  and  h2  are  the  heights  of  the  left  and  right  sub-trees,  then  h  is  the  height  of  the 
full  tree.  If  our  expression  language  had  a  function  max  of  type  Z  x  Z  — *  Z  that  returned 
the  greater  of  its  two  arguments,  then  we  could  represent  this  constraint  more  succinctly 

as  h  =  max  (hi,  h2)  +  1. 

We  can  also  specify  more  abstract  notions  of  size.  For  example,  below  is  the  same 
tree  definition,  but  with  argument  a  representing  an  abstract  notion  of  size,  rather  than  a 
particular  size  measure. 

treea(a,  r)  =  (a  —  0  A  r  =  nil) 

V  (a  >  0  A  3ai,  a2.  («i  A  a)  A  (a2  A  a) 

3/c,  rc.  r  (->•  [left  :  Ic,  right  :  rc) 

*  treea(ai,  Ic)  *  treea(a2,rc)) 

The  specific  size  measures  discussed  previously — number  of  nodes  and  height — would 
both  satisfy  this  definition.  That  is,  if  treeh  is  the  tree  predicate  that  tracks  height  and  tree 
is  the  predicate  that  specifies  the  number  of  nodes  and  treea  is  the  definition  above,  then 
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we  have 

tree(h,  r )  =>■  treea(h,  r ) 
treeh(h,  r )  =>■  treea(h,  r ) 

This  follows  from  the  fact  that  the  update  relation  for  tree  is  contained  in  the  update  rela¬ 
tion  for  treea,  and  similarly  for  treeh.  More  specifically,  we  can  view  the  pure  constraint 
on  sizes  as  a  relation  between  “size  of  the  entire  tree,”  “size  of  the  left  sub-tree,”  and  “size 
of  the  right  sub-tree.”  If  we  then  write  s,  si,  and  sr  for  these  quantities,  thus  unifying  our 
variable  notation,  we  get  an  update  relation  of  s  —  s/+sr+l  for  tree  and  (st  <  s)A(sr  <  s ) 
for  treea.  The  fact  that  for  st,  sr  >  0  we  have  (s  =  si  +  sr  +  1)  =>■  (st  <  s)  A  (sr  <  s )  is 
then  the  main  step  in  justifying  the  first  implication  given  above. 

To  consider  another  example,  below  is  the  definition  of  a  predicate  for  a  list  of  integers 
where  the  notion  of  size  is  the  sum  of  the  integers  in  the  list.  Note  that  termination  of 
a  traversal  routine  could  be  established  for  such  a  notion  of  size  only  if  the  list  contains 
solely  positive  elements. 

Is  (n,  first,  next )  = 

(emp  A  first  =  next  An  —  0) 

V  (3  z.  ((first  i — y  [next  :  z,  data  :  d\)  *  ls(n' ,  z,  next))  An  =  n!  +  4) 

This  is  also  an  example  of  a  situation  where  there  is  not  a  condition  on  the  size  that 
uniquely  determines  which  case  of  the  definition  applies.  If  we  have  ls(n,  a,  b)  and  n  >  0, 
then  the  definition  above  specifies  that  the  list  must  be  non-empty.  However,  if  n  =  0, 
then  either  case  of  the  definition  may  hold. 


4.3  Soundness 

In  this  section,  we  prove  that  instrumented  programs  meeting  our  criteria  simulate  the 
original  program.  This  takes  us  half-way  to  numeric  abstractions.  In  Section  4.4,  we 
complete  the  formal  development  by  showing  how  numeric  abstractions  can  be  extracted 
from  instrumented  programs. 
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Definition  31.  Let  RVJ  be  the  relation  on  execution  states  defined  as  follows.  We  use  the 
notation  V  to  abbreviate  the  set  Vars  —  V. 


goto(/,  (s,h)) 

Rv’r 

goto (/}  (sji)) 

J-j 

>  > 

to) 

> 

(k,  (. s,h )) 

Rv*r 

(k,  ( s,h )) 

iff  3 Q.  (r  h  {Q}  k  ►y  fc)  A  ((s,  /))  |=  Q) 
A  (s  =7  s)  A  {h  —  h ) 

final(s,  h ) 

Rv’r 

final(s,  /i) 

iff  (s  =v  s)  A  (/i  =  /)) 

error 

Rv’r 

error 

We  can  now  state  the  main  theorem  associated  with  the  proof  system  in  Figure  4.1. 
This  states  that,  if  P  is  an  instrumented  version  of  P  according  to  the  proof  rules  in  Figures 
4.1  and  4.2,  then  P  with  initial  states  satisfying  T(initloc(P))  is  simulated  by  P  with  the 
same  set  of  initial  states. 

Theorem  22.  (Soundness)  Let  Q0  =  Y(mitloc(P)).  Then  T  \-  P  ►  y  P  implies 

IP  I  Qo))  £ 

Rv’r,=v  IP  I  Qo)). 

Proof.  We  must  show  that  Rvr  satisfies  the  conditions  in  Definition  29.  We  consider  each 
condition  in  order. 

goal  ( Initial  States  Related): 

By  Definition  14  we  have  that  the  initial  states  /  of  ((P  |  Qo))  are 

I  =  {goto(/0,  (s,  h ))  |  (/0  =  initloc(P))  A  (s,  h)  |=  Qo} 

and  the  initial  states  /  of  ((P  |  Qo))  are 

/  =  {goto(/0,  (s,  h ))  |  (/o  =  initloc(P))  A  (s,  h)  |=  Qo} 

We  must  show  that  V7  €  I.  zQ  El.  7  RVT  7.  Consider  7  El.  We  have  that 
7  =  goto (l0,(s,h))  where  l0  =  initloc(P)  and  (s,  h)  |=  Q0.  Since  Q0  =  T (/0)  we 
have  (s,  h)  |=  r (70).  By  our  definition  of  Rv,r,  we  then  have  the  following. 

goto(/Q,  (s,  h))  Rvr  goto (/0,  (s,  h)) 
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By  Lemma  14  we  have  initloc(P)  =  initloc(P),  thus  we  have  that  goto(/0,  (s,  h))  G  I, 
completing  the  proof  of  this  case. 

goal  (=y-equivalent): 

V7l,  72-  (7l  RV,V  72)  =7  (7l  =y  72) 

This  follows  immediately  from  our  definition  of  RVJ  and  the  definition  of  the  =p 
relation. 

goal  (P  Transitions  Match):  If  7  Bvr  7  and  7  —^7  7'  then  one  of  the  following  holds 

1.  (P  Matches)  7  —^7  7'  and  7'  Pvr  7 ' 

2.  (P  Stutters)  (7'  PVT  7)  and  (rankt( 7',  7)  <  rankt( 7,7)) 

3.  (P  Stutters)  7  — *  7 '  and  7  i?vvr  7'  and  rankl( 7 7, 7')  <  ranklij ,  7, 7'). 

p 

Since  7  —^7  7'  we  know  that  7  either  has  the  form  goto(/,  (s,  /i))  or  (/c,  (s,  /i)). 

Goto  State  Suppose  it  has  the  form  goto(/,  (s,  h)).  Then  by  the  definition  of  Bvv ,  the 
state  7  must  have  the  form  goto(/,  (s,  h))  with  (s,  h)  |=  T(/)  and  l  =  l  and  s  =p  s  and 
h  =  h.  We  have  from  the  definitions  of  — >  and  — >  that 

p  p 

goto (l,  (s,  h))  ( P(l ),  (s,  h)) 

and 

goto#  (s,  ft))  (P(0,  (s,  ft)) 

p 

Since  1  =  1,  the  second  statement  is  equivalent  to 

goto#;  (s,  ft))  (P(l),  (: s,  ft )) 

p 

We  will  show  that  condition  1  holds  (P  matches).  This  corresponds  to  the  statement  below. 

{P(l),(s,h)}Rv’T  ( P(l),(s,h )) 
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This  follows  from  the  conclusions  of  Lemma  14.  We  already  have  that  s  —y  s  and  h  =  h 
and  (s,  h )  f=  T(/).  Lemma  14  gives  us  that  T  F  (r(/)}  P(l)  ►y  P{1),  which  is  the  last 
condition  needed  to  establish  that  the  states  are  Rl  r-related. 


Intermediate  State  Now  we  consider  the  case  where  7  has  the  form  (k.  (s,  h)).  From 
the  definition  of  Rvr  for  states  of  this  form,  we  have  that  there  exists  a  Q  such  that  the 
following  hold. 


(Assumption  1 ) 
(Assumption  2) 
(Assumption  3 ) 
(Assumption  4) 


Th{Q}k  ►y  k 
(s,h)  |=  Q 

S=yS 

h  =  h 


We  will  show  that  for  all  choices  of  k ,  s,  h,  k ,  s,  h  consistent  with  these  assumptions, 
one  of  the  goal  conditions  holds  (either  P  matches,  P  stutters,  or  P  stutters).  The  proof  is 
by  induction  on  the  derivation  of  T  h  {Q}  k  ►y  k  with  one  case  for  each  rule  in  Figure 
4.1.  The  induction  is  required  to  handle  the  Strengthening  rule.  Figure  4.1 1  summarizes 
the  variables  used  throughout  this  proof. 

In  the  cases  where  either  P  or  P  stutters,  we  must  also  show  that  a  ranking  function  de¬ 
creases,  in  order  to  rule  out  the  possibility  of  an  infinite  sequence  of  states  being  matched 
by  a  single  state  (and  thus  infinite  traces  being  matched  by  finite  traces).  The  ranking 
function  in  this  case  will  simply  be  the  size  of  the  continuation  k  in  a  state  of  the  form 
(k,  ( s ,  h))  and  0  in  the  case  of  error  or  final(s,  h ).  Formally,  we  have  the  following  defi¬ 
nitions  for  rankt  and  rankl,  where  size(k )  represents  the  number  of  nodes  in  the  abstract 
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7 

{k,  ( s,h )) 


Rv r 


7 

(k,  ( s,h )) 


P 


7 


P 


7 


Figure  4.11:  Guide  to  variable  names  used  throughout  the  proof  of  Theorem  22.  In  each  case  of 
the  proof,  our  goal  is  to  show  that  one  of  the  dashed  relation  lines  exists. 


syntax  tree  for  k. 


rankt({k,  ( s ,  h)),  7) 
rankt  (error,  7) 
rankt(Rnal(s,  h),  7) 


size(k) 

0 

0 


rankl((k,  (s,h)),  7,7') 
rankl(e  rror,  7, 7') 
ran/4(final(s,  /i),  7, 7') 


size(k) 

0 

0 


CASE 


^Halt 


r  h  {<5}  halt  ►y  halt 


In  this  case,  k  =  halt  and  k  =  halt  and  7'  =  final(s,  h).  Since  k  =  halt,  we  have  that 
(k,  (s,  /i))  — >•  final(s,  /t).  It  remains  to  show  that  final(s,  h )  /?VJ  final(s,  /i). 

p 

This  follows  from  (Assumption  3),  (Assumption  4),  and  the  definition  of  RVJ .  Thus, 
we  have  shown  that  P  can  match  the  transition. 

(Abort 

— 

T  b  {Q}  abort  ►y  abort 
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4.3  Soundness 


In  this  case,  k  =  abort  and  k  =  abort.  Thus,  7'  =  error.  We  have  immediately  from 

the  definition  of  — »  that  (abort,  (s,  h))  — »  error.  We  have  that  error  Bvv  error  by 

P  P 

the  definition  of  BVT  for  final  states.  Thus,  we  have  shown  that  P  can  match  the  transition. 


/Goto 


CASE 


r(0  =  Q 


\ 


\T  b  {Q}  gOtO  l  ►  y  gOtO  l  J 


This  is  very  similar  to  the  halt  case.  We  have  that  k  =  goto  l  and  k  =  goto  /.  By  the 
definition  of  — »  we  have  (k,  (s,  h))  — >  goto(/,  (s,  h))  and  (k,  (s',  h ))  — ^  goto(/,  (s',  h)) 

P  P  p 

We  must  show  that  goto (l,(s,h))  RV,T  goto(l,  (s,  h))  which  requires  showing  that 
s  —y  s',  h  =  h,  and  (s',  h)  |=  T(l).  The  first  two  are  exactly  (Assumption  3)  and  (As¬ 
sumption  4).  The  last  follows  from  (Assumption  2)  by  the  premise  of  this  rule,  which 
states  that  T(l)  =  Q.  Thus,  P  matches  the  transition. 


CASE 


/Command  \ 

{Q}  c  {Q’}  T  b  {Q’}  k  ►y  k 

y  r  b  {Q}  (c;k)  ►y  (c;  k)  y 


We  have  from  (Assumption  4)  that  h  =  h.  From  the  definition  of  —p,  we  have  the 
transition  ((c;  k),  (s,  h))  7  where  either 

7  =  error 
or 

7  =  (k,  (s',  h!))  A  (s',  ti)  G  [c]  (s,  h) 

For  the  error  case,  we  apply  Corollary  3  to  obtain  V  D  fv(c)  =0  and  thus  fv(c)  C  V.  This 
together  with  (Assumption  3)  allows  us  to  apply  Lemma  3  and  obtain  error  G  [c]  (s,  h) 
and  thus  ((c;  k),  (s,  h))  — >  error.  This  completes  this  case  since  error  PVJ  error. 

p 

For  the  non-error  case,  we  apply  Corollary  3  to  obtain  fv(c)  C  V.  This  and  (Assump¬ 
tion  3)  allows  us  to  apply  Lemma  2,  which  gives  us  an  s'  such  that  (s',  h')  G  [c]  (s,  h)  and 

s'  —y  s'.  The  semantics  of  continuations  then  gives  us  that  ((c;  k),  (s,  h))  — >  (k,  (s',  h')). 

P 
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Applying  our  equality  h  =  h  to  this  transition  wc  then  have  ((c;  k),  (s,h))  — *  (k.  (s' ,  h!)). 

P 

Our  goal  is  to  show  that  ( k ,  (s',  h '))  Rv,r  ( k ,  (s',  //)).  We  have  shown  one  condition  of 
Pvr,  namely  that  s'  =p  s'.  The  condition  on  heaps  in  this  case  is  h'  =  h1 ,  which  is 
immediate.  It  remains  to  show  that  (s',  h')  \=  O'  and  V  b  { O' }  k  ►y  k. 

From  (Assumption  2)  and  (s',  h ')  G  [c]  (s,  /i)  and  {Q}  c  { Q '}  we  have  (s',  /i')  (=  Q'. 
From  the  second  premise  of  the  rule  under  consideration  we  have  T  b  {()'}  k  ►  y  k.  These 
were  the  only  remaining  conditions,  so  we  have  shown  that  P  can  match  P’s  transition. 


CASE 


/Strengthening  \ 

Q  =>•  Q'  r  b  { Q' }  k  ►y  k 

v  rb{Q}^yfc  y 


We  have  F  b  { O' }  k  ►y  A;  by  the  second  premise  and  (s,  //)  O  by  (Assumption  2). 
Since  Q  Q'  we  have  (s,  //)  |=  O'.  This,  together  with  (Assumption  3)  and  (Assumption 
4)  allows  us  to  apply  the  induction  hypothesis  on  T  b  {()'}  k  ►y  k,  thus  proving  the  goal. 


/Branch 


CASE 


Vi.  (T  b  {Q  A  ei}  ki  ►y  h) 


\ 


yT  b  {Q}  branch 


ki, .  .  .  end  ►y  branch  ...,  e*  .  end  y 


Since  7  — ^  7'  we  have  that  [e*]  s  =  true  for  some  i  and  7'  =  ( ki,(s,h )).  By 

Corollary  3  we  have  that  V  D  fv(e )  =  0.  Thus,  //(e)  C  C.  This  lets  us  apply  Lemma  1  to 

conclude  that  [e,]  s  =  true.  Thus,  7  — )■  7'  and  7'  =  (ki,  (s,  /r)). 

P 

Since  [e*]  s  =  true  and  ( s ,  /i)  |=  Q  by  (Assumption  2)  we  have  (s,  A)  /  Q  A  e^.  We 
also  have  T  b  {Q  A  e*}  k,  ►y  0  as  one  of  the  premises  of  the  rule  under  consideration. 
Then  7'  Pv,r  7'  follows  from  these  facts  and  (Assumption  3)  and  (Assumption  4).  We  have 
shown  that  in  this  case  P  can  match  the  transition  that  P  takes. 

'False 

/ 

CASE 


T  b  {false}  halt 


k 
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This  case  holds  vacuously.  One  of  our  assumptions  is  that  (s',  h )  |=  Q.  But  in  this  case 
Q  =  false.  Since  there  are  no  states  satisfying  false,  our  assumptions  are  contradictory. 


CASE 


/Inst-Assign 

{Q}x:=e{Q'}  TR{Q'}k>vk 

V  T  b  { Q }  (x  :=  e;  k)  ►  y  k 


\ 

x  e  V 

/ 


We  will  show  that  P  stutters.  We  have  that  7  =  ((x  :=  e;  k),  (s,  h))  and,  applying 

the  definition  of  — >•  we  have  7  — )■  7 '  where  7'  =  (/c,  (sfa;  — >•  [[e]  s],  h)).  Since  x  £  V 
P  P 

we  have  s[x  — >  [e]  s]  =7  s  and  thus,  by  (Assumption  3)  and  transitivity  of  =7  we  have 
s[x  -A  [e]  s]  =7  s.  This  is  one  condition  required  to  establish  7  f?v,r  7 '. 

The  premise  { 0}  x  :=  e  {()'}  and  (Assumption  2)  allow  us  to  conclude  that 
(sfa;  — >  [e]  s],  h)  |=  O' .  This  is  another  condition  for  7  Rvv  7'.  The  second  premise 
of  the  rule  under  consideration  and  (Assumption  4)  provide  the  other  two  conditions,  com¬ 
pleting  the  proof  that  7  Rvr  7 '. 

We  must  also  show  that  rankl  decreases.  We  have  ranklipj ,7,7')  =  size(x  :=  e;  k) 
and  rankl (7',  7, 7')  =  size(k).  Since  size(k)  is  the  size  of  the  abstract  syntax  tree  for  k, 
we  have  that  size(k)  <  size(x  :=  e;  k ). 

/Inst-Disj  \ 

CASE  r  Wi}  ^1  ►v  k  r  b  {Q2}  k2  ►y  k  . 

VT  b  {Qi  V  Q2}  branch  true  =>  /c1?true  =7  k2  end  ►y  k  y 


We  will  show  that  7  makes  a  stuttering  transition.  That  is,  7  — >■  7 '  and  7  f?v  r  7'. 

p 

From  (Assumption  2)  we  have  that  (s,  h)  \—  Q\  V  02-  This  implies  that  either  (s,  h)  |=  Qi 
or  (s,  b)  |=  Q2- 

Suppose  the  first  case  holds,  so  (s', /i)  [=  Qi-  Then  let  7'  be  (/?!,  (s,  /i)).  Since 
(s',  /i)  |=  true,  we  have  that  7  — >  7'.  That  7  /? v- T  7 '  then  follows  from  the  first  premise, 

p 

(Assumption  3),  (Assumption  4),  and  (s,  h)  \=  Q 1,  which  was  our  assumption  for  this  case. 
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The  (s,  h)  |=  Q‘2  case  is  similar,  with  ()■>  substituted  for  0  \  and  the  second  premise 
used  in  place  of  the  first  premise. 

The  condition  that  rankl  decreases  is  satisfied  since  k\  is  a  smaller  term  than 
branch  true  =>  £7, true  =7  k2  end. 


CASE 


/Inst-Exists 

TP{Q}k  ►;/  k 

vr  b  {3xr.  Q }  (x  :=  ?r;  k)  ►y  k 


\ 

x  eV 


This  is  similar  to  the  previous  case,  except  that  the  non-determinism  is  unbounded 
rather  than  a  choice  between  two  alternatives.  We  will  consider  only  the  case  where  r  =  i. 
The  case  for  a  is  similar.  We  have  that  (s,  h)  |=  bx1.  0  and  thus,  by  the  semantics  of 
existential  quantifiers  there  is  some  v  G  Z  such  that  (s[x‘  — y  v],h)  \=  Q.  From  the 
semantics  for  non-deterministic  assignment,  we  know  there  is  some  execution  of  x1  :=  71 
that  assigns  v  to  x1.  Formally,  we  have  that  (b[x'  — *  v],h)  G  [x1  :=  71]  s  which  implies 
that  ((x1  :=  71;  k),  (s,  h))  —fr  y'  where  j'  =  ( k ,  (sfx1  — >  v],h)).  It  remains  to  show  that 

7i?y’r  7'. 


We  have  (s[x‘  — >  x],/r)  |=  Q  and  T  b  {Q}  k  ►y  k.  Since  x1  G  V  and  V  is  the 
complement  of  V,  we  have  that  x1  ^  l  .  This  allows  us  to  conclude  that  s[x'  — >  x]  =y  s 
and  thus,  by  transitivity  of  =7  and  (Assumption  3)  we  have  b[x‘  — >  v]  —p  s.  This  is  the 
third  of  the  four  conditions  for  establishing  7  Rv,r  7 '.  (Assumption  4)  provides  the  fourth 
condition  and  completes  the  proof. 


As  before,  the  condition  on  rankl  reduces  to  showing  that  size(k)  <  size(x1  :=  71;  k) 
which  is  immediate. 


/Inst-Assume 


CASE 


\ 


Q 


F  b  {<5}  k  ►y  k 


y T  b  {Q}  assume(e);  k  ►y  k  J 


We  will  show  that  ((assume(e);  k),  (s,  h))  7 '  and  7  Rvr  7 '.  The  transition  can 

occur  if  (s',  h )  [=  e.  We  have  from  (Assumption  2)  that  (s,  h)  \=  Q.  The  premise  Q  e 
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then  gives  us  that  (s,  h)  |=  e.  It  remains  to  show  that  7P7 ' .  This  follows  from  (Assumption 
2),  (Assumption  3),  (Assumption  4),  and  the  second  premise. 

As  before,  since  size(k)  <  size(assume(e) ;  k)  we  have  that  rankl  decreases. 

goal  (Final  States  Related): 

By  Definition  14  we  have  that  the  final  states  F  of  ((P  |  Q0))  are 

F  =  {final(s,  h)  |  s  G  Stores  A  h  G  Heaps }  U  {error} 

The  final  states  F  of  ((P  |  Q0))  are  the  same. 

F  =  { final  (s,  h)  |  s  G  Stores  A  he  Heaps }  U  {error} 

We  must  show  the  following. 

V7  G  /.  V 7  G  /.  (7  PVT  7)  =7  (7  G  P  77  7  G  F) 

This  follows  directly  from  our  definition  of  F!VJ .  Examining  Definition  31,  we  can  see  that 
error  is  only  Rx  1  -related  to  error  and  final(s,  h )  is  only  Pv,r-related  to  final(s,  h).  □ 

Below  we  make  note  of  an  important  corollary.  This  follows  from  the  theorem  above 
(Theorem  22),  Theorem  18,  and  Corollary  2. 

Corollary  4.  Let  Q0  =  Y(initloc(P )).  Then  T  h  P  ►y  P  and  ((P  |  Qo  j)  |=  <p  implies 

((P I  Qo))  h  My,  0) 

This  tells  us  that  if  we  prove  some  LTSL  formula  holds  of  ((P  |  Qo)),  we  can  obtain 
an  LTSL  formula  that  holds  of  ((P  |  Qo))  by  existentially  quantifying  the  instrumentation 
variables  appearing  in  the  formula.  As  a  special  case,  formulas  that  hold  of  P  and  do  not 
contain  instrumentation  variables  do  not  need  to  be  changed.  The  same  formula  that  held 
of  P  will  also  hold  of  P. 

As  an  example,  consider  the  program  below. 

L0  :  goto  L ! 

Li  :  (T)  branch  x  ^  nil  =7  (2)  x  :=  a;. next;  (3)  goto  L1; 
x  —  nil  (4)  halt  end 
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The  following  is  an  instrumented  version  of  this  program. 

L0  :  n2  :=  n ;  n \  :=  0;  goto  L\ 

Li  :  (0  branch  x  ^  nil  =>■  Q  x  :=  rr.next;  Q  ri\  :=  ri\  +  1  ; 

n2  :=  n2  -  1;  goto  Li, 
x  =  nil  ^  halt  end 

Starting  from  the  precondition  ls(n,  x,  nil)  we  can  show  that  the  following  formula  holds 
of  the  instrumented  program. 

G(a£/oc(Lj)  (3x'.  ls(ni,x',x)  *  /s(n2,  x,  nil))  A  n\  +  n2  =  n) 

This  states  that  if  n  is  the  length  of  the  list  before  executing  the  code,  then  at  L\,  during 
every  iteration  of  the  loop,  ri\  and  n2  sum  to  n.  Note  that  n  is  not  an  instrumentation 
variable  here,  but  a  program  variable  containing  the  initial  length  of  the  list.  Our  corollary 
above  then  tells  us  that  the  following  LTSL  formula  holds  of  the  original  program. 

G  (ai/oc(Li)  (3ni,  n2,  x'.  ls(n i,  x1,  x)  *  /s(n2,  x,  nil))  A  n\  +  n2  =  n)) 

This  is  the  same  formula  as  before,  but  with  the  instrumentation  variables  n  i  and  n2  exis¬ 
tentially  quantified.  This  loop  invariant  is  strong  enough  to  let  us  conclude  that  the  length 
of  the  list  is  unchanged  by  the  traversal. 


4.4  Numeric  Abstractions 

In  Figure  4.12  we  give  the  rules  for  generating  a  projection  of  a  continuation  onto  a  set  of 
variables  V.  This  results  in  a  continuation  that  only  involves  reads  and  writes  to  variables 
in  V  and  does  not  include  any  heap  commands.  The  projection  function  "y(k)  is  defined 
with  the  help  of  the  predicates  Wy(c)  and  dety(c). 

The  predicate  lf\-  ( c )  holds  if  the  command  c  writes  to  a  variable  in  V.  For  example,  if 
V  =  {:/:}.  then  x  :=  alloc(. . .)  satisfies  this  since  it  results  in  the  newly  allocated  address 
being  written  to  x,  which  is  in  V.  The  other  commands  that  write  to  x  are  x  :=  e,  x  :=  ?, 
and  x  :=  x2.f. 
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The  predicate  dety(c)  holds  if  the  result  of  c  is  determined  given  only  the  values  of 
the  variables  in  V  (and,  crucially,  given  no  access  to  the  heap).  The  only  command  that 
satisfies  this  is  x  :=  e  in  the  case  where  fv(e)  C  V. 

The  function  nv(k)  discards  command  that  do  not  write  to  variables  in  V  and  it  re¬ 
places  with  non-deterministic  assignment  any  commands  that  write  to  variables  in  V  but 
are  not  determined.  The  result  is  that  writes  into  heap  cells  and  free  x  commands  are 
always  discarded.  Allocation  and  heap  lookup  are  replaced  with  non-deterministic  assign¬ 
ment.  Non-deterministic  assignments  present  in  the  original  program  are  carried  through 
to  the  projected  program  provided  they  affect  a  variable  in  V.  For  deterministic  assign¬ 
ment  commands  x  :=  e,  the  command  is  discarded  if  x  ^  V,  it  is  converted  to  the  non- 
deterministic  assignment  x  :=  ?  if  e  contains  any  variables  not  in  V,  and  otherwise  it  is 
carried  through  unchanged. 

Branch  conditions  are  carried  over  unchanged  if  the  condition  only  involves  variables 
in  V  or,  if  variables  outside  of  V  are  required,  the  branch  is  replaced  by  true.  With  such 
an  approach,  when  we  encounter  a  branch  that  cannot  be  evaluated  accurately  in  the  pro¬ 
jection,  we  conservatively  assume  that  the  branch  can  be  taken,  thus  erring  on  the  side  of 
exploring  more  paths  (and  consequently  maintaining  soundness  for  universal  properties 
over  paths,  such  as  our  LTSL  formulae).  Note  that  fv(7TV(P))  C  V,  a  fact  that  can  be 
verified  by  induction  over  the  structure  of  P. 

The  projection  operation  for  programs  is  defined  as  follows  (where  7 rv(P(l))  refers  to 
the  projection  of  the  continuation  P(l),  as  defined  in  Figure  4.12). 

Definition  32.  The  projection  of  a  program  P  onto  variables  V,  written  i iy(P),  is  the 
program  P'  such  that  dom(P')  =  dom(P),  initloc(P')  =  initloc(P)  and\/l  G  dom(P). 

p\i)  =  Mm)- 

Our  numeric  programs  will  be  the  result  of  projecting  an  instrumented  program  onto  a 
subset  of  the  integer-valued  variables.  These  variables  can  include  instrumented  variables 
as  well  as  program  variables.  Maintaining  program  variables  in  the  projection  is  necessary 
when  the  LTSL  formula  being  checked  contains  program  variables.  It  may  be  necessary 
in  other  cases  as  well — for  example,  if  termination  depends  on  the  fact  that  a  program 
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Commands  that  write  to  variables  in  V 

Wy  (c)  iff  for  some  x  €  V.  c  has  the  form 

x  :=  e  or  x  :=  ?  or  x  :=  alloc(. . .)  or  x  :=  X2 ./ 

Commands  that  are  determined  given  V 
dety(c)  iff  c  has  the  form  x  :=  e  and  fv(e)  C  V 


7rv(c;  k ) 


let  7 Ty 


(  branch 

ei  t  /ci,  • 
en  kn 

\  end 


TTy(k) 


Definition  of  ny(k) 


c;  (Try(k)) 

if  Wy(c)  and  dety(c) 

x  :=  ? ;  (7 rv(k)) 

if  l'l'V(c)  and  -1  dety(c)  and 

c  has  the  form  x  :=  . . . 

^nv(k) 

otherwise 

branch 

ei  =>•  ny(ki), . . . ,  fei  if/u(e*)cy 

where  e'  =  < 

en  =>  7rv(^’n)  [true  if  fufe)  V 

end 


k  if  k  =  abort  or  k  =  halt  or  k  =  goto  l 


Figure  4.12:  Definition  of  the  function  TTy(k)  which  projects  a  continuation  onto  variables  in  V. 


variable  is  decreasing  and  has  a  lower  bound,  then  that  variable  must  be  preserved  in  the 
projection. 
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4.4.1  Projection  and  Simulation 

We  now  discuss  how  the  concept  of  program  projections  fits  into  the  formal  framework 
presented  earlier  for  instrumented  programs.  Recall  the  definition  of  =y  (Definition  24), 
reproduced  below. 

Definition  24.  =v  is  the  least  relation  on  execution  states  satisfying  the  following. 


(k,  (s,  h))  =v  (. k',(s',h ')) 

iff  s  —v  s' 

goto  (l,(s,h))  =v  goto  (l,(s',h')) 

iff  s  —v  s' 

final(s,  h )  =y  finals',  h ') 

iff  s  —v  s' 

error  =y  error 

This  will  be  the  relation  on  states  that  is  preserved  by  projection.  The  following  theo¬ 
rem  captures  this  fact.  The  proof  is  fairly  straightforward,  as  the  projection  translates  each 
command  or  branch  to  a  version  that  is  at  least  as  non-deterministic  as  the  original.  Thus, 
the  projected  command  /  branch  includes  the  original  behavior  as  well  as  possibly  some 
additional  behavior. 

Theorem  23.  If  P'  =  "v(P)  then  there  exists  cm  R  such  that  for  all  Qo,  the  following 
holds. 

(P I  Qo))  Er,Sv  ((i5'  I  Qo)) 

Proof.  The  R  in  this  case  is  the  least  relation  satisfying  the  following. 

( k,(s,h ))  R  (k',  (s',  h')))  iff  kl  —  nv(k)  and  s  —v  s' 

(goto (l,(s,h)))  R  (goto (l,(s',h')))  iff  s  —v  s' 
final(s,  h )  R  finals',  h')  iff  s  —y  s' 

error  R  error 

The  ranking  functions  rankl  and  rankt  are  defined  as  in  the  proof  of  Theorem  22  in 
Section  4.3  (see  page  164). 
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Initial  States  Related  First  we  show  that  initial  states  are  related.  Every  state 

goto (initloc(P),  (s,  h))  is  related  to  the  state  goto (initloc(P'),  (s,  h)).  This  holds  be¬ 
cause  Pl  =  7 jy(-P)  ensures  that  initloc(P')  =  initloc(P)  and  reflexivity  of  =v  gives  us 
s  =v  s.  Together,  these  establish  the  necessary  conditions  for  R  to  hold,  giving  us 

(goto (initloc(P),  (s,  h)))  R  (goto (initloc(P'),  (s,  h))) 

— v  -equivalent  The  second  condition  of  stuttering  simulation,  that  R  implies  is  easy 
to  check.  We  can  see  that  R  is  strictly  contained  in  =v  since  all  the  conditions  are  the  same 
except  that  R  additionally  requires  k'  =  i Ty(k)  in  the  case  where  (k,  (s,  h))  R  { k (s',  /?/)). 

Transitions  Match  The  third  condition  is  that  any  transition  of  P  can  be  matched.  Sup¬ 
pose  71  R  72  and  71  Y1.  Then  71  must  either  have  the  form  goto(/,  (si,/ii))  or 

(h,  (s!,hi)). 

CASE  7!  =  goto(/,  (. Si,  hi )):  By  the  definition  of  R,  we  have  that  72  has  the  form 
goto(/,  (s2,  h2))  with  si  —v  s 2.  By  the  semantics  of  program  transitions,  we  have 

goto(/,  (si,  hi))  ( P(l ),  (si,  hi)) 

and 

goto(/,  (s2,  h2))  { P'(l ),  (s2,  h2)) 

We  will  show 

(P(/),(Sl,/il))f?(P/(0,(^,M) 

We  already  have  si  —y  s2.  It  remains  to  show  that  P'(l)  =  n y(P(l)).  This  follows 
directly  from  the  definition  of  7 iy(P)  and  the  fact  that  P'  =  nv(P).  Expanding  these 
definitions,  we  have  that  7 Ty(P)(l)  =  ny(P(l)),  which  gives  us  our  result. 

CASE  71  =  (ki,  (si,  hi)):  Since  71  R  72,  we  have  that  72  has  the  form  (k2,  (s2,  h2))  with 
-Si  =v  s2  and  k2  =  ny(ki).  We  now  consider  each  possible  form  for  k\. 

CASE  k]  =  (c;  k[):  In  this  case,  k2,  which  is  7ty(ki),  depends  on  whether  Wv(c)  and 
dety(c)  are  true. 
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SUB-CASE  Wy(c)  AND  dety(c ):  In  this  case,  we  have  that  k2  =  (c;  k'2)  where 
k'2  =  7rv(k[).  That  detv(c)  holds  ensures  that  c  =  (x  :=  e )  and  /7(e)  C  1/  which, 
together  with  si  =y  s2  ensures  that  [e]  Si  =  [e]  s2  (by  Lemma  1).  Let  v  be  this  value 
([e]  .s' i ) .  The  definition  of  tells  us  that  71  ( k[ ,  (si[x  — >  v],  hi)}.  Similarly,  we 

have  that  72  ( k'2 ,  (s2[x  — >  v],  h2)).  We  must  show  that  (si[x  — >  i>])  —v  (s2[x  — *  n]). 

This  follows  from  the  fact  that  si  —y  s2.  We  already  have  that  k2  =  7iy ( k\ ) .  Thus,  P'  can 
match  the  transition. 

SUB-CASE  Wy(c )  AND  ->detv(c):  In  this  case,  c  has  either  the  form  x  :=  e  or  x  :  =  ? 
or  x  :=  alloc(. . .)  or  x  :=  x2.f  for  some  x  G  V.  In  all  these  cases,  we  have  a  transition 
((c;  k\ ) .  (s  1, 1 1 1  j )  — (A- 1 ,  (s', ,  //',  j ) .  The  exact  conditions  on  s',  and  h\  differ;  however,  in 
every  case  we  have  that  .s',  =  s  ]  [x  —>  v]  for  some  v  in  the  appropriate  domain  (either  ad¬ 
dresses  or  integers  depending  on  the  type  of  x).  We  have  k2  =  Ttv{k\)  =  (x  :=  ?;  7Tv(k[)), 
which,  given  the  semantics  of  a;  :=  ?  ensures  that 

(k2,  (s2,  h2)}  {nv(k[),  (s2[x  ->•  v],  h2)) 

That  (si[a:  — >  n])  =y  (s2[a;  — >  v])  then  follows  from  si  —y  s2,  which  we  have  from 
71  R  72 .  Thus,  P’  can  match  the  transition  of  P . 

SUB-CASE  -1  (Wy(c)):  In  this  case,  k2  =  7r v(k[). 

In  this  case,  either  c  does  not  write  to  some  store  variable  x  or  it  does  but  x  is  not  in  V. 
If  the  command  in  question  does  not  modify  the  store,  then  we  have  7,  =  (k\ .  (si,  h\ ) ) . 
We  also  have  71  R  72  and  will  show  that  7,  R  72  where  we  recall  that  72  =  (k2,  (s2,  h2)). 
To  do  this  we  must  show  si  —y  s2,  which  we  already  have  from  the  definition  of  R 
and  71  R  72.  We  also  must  show  that  k2  =  ny(k\  j,  but  this  we  already  have  from  our 
assumptions.  The  only  remaining  condition  is  to  show  that  the  ranking  function  decreases. 
This  is  the  case  since  k\  is  a  sub-term  of  k\. 

We  now  consider  the  case  where  the  command  c  modifies  store  variable  x,  but  x  is  not 
in  V.  Here  we  have  that  7^  =  {k\ ,  (si  [x  — >  v],  h\ ) )  for  some  v.  We  will  show  that  7J  R  72, 
where  72  =  (k2,  (s2,  h2)).  We  already  have  that  k2  =  ny(k[).  We  must  also  show  that 
(si[a;  — y  n])  —v  s2.  This  follows  from  si  —y  s2  and  x  £  V,  which  we  have  from  our 
assumptions. 
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CASE  k\  =  (branch  e\  =>•  k'u  . . . ,  en  =>•  k'n  end):  In  this  case  we  have 

A;2  =  (branch  e\  =>•  nv(k[),  ...,e'n=>  ^v(K)  end) 

where  e'  =  e,  if  fvfa )  C  V  or  e'  =  true  otherwise. 

We  are  assuming  that  (&i,  (si,  /ii))  If  this  is  the  case,  then  7)  =  (/c',  (si,  /?i)) 

for  some  i  such  that  [e*]  si  =  true.  We  want  to  show  that  for  72  =  {k2,  (s2,  /12))  we 
have  72  — A  72  and  7',  f?  72.  We  first  case  split  on  whether  e'  =  true  or  e'  =  e*.  In 
the  first  case,  we  are  done  since  branches  labeled  with  true  can  always  be  taken.  So  we 
have  72  — >  (77/ (£;'),  (s2,  h2)}.  We  already  have  .Sj  =y  s2,  which  is  sufficient  to  show 

71  R  (nv(k'i),  (s2,h2)). 

In  the  case  where  e'  =  ei5  we  use  our  an  assumption  [e*]  .S]  =  true.  Since  s  1  =y  .s2, 
we  have  [e,:]  s2  =  true  by  Lemma  1.  Applying  the  equality  e'  =  e,  gives  us  [e'J  s2  =  true, 
which  is  sufficient  to  ensure  that  the  transition  72  (7ry(/c'),  (s2,  h2))  exists.  That 

7(  R  (7 Tvik'j),  (s2,  h2 ))  then  follows  from  our  assumption  that  si  =y  ,s2. 

CASE  k\  =  abort:  In  this  case,  7)  =  error.  Also,  k2  =  TTv(ki)  =  abort,  which  ensures 

72  — >  error.  Since  error  R  error  we  are  done. 

p> 

CASE  k\  =  halt:  In  this  case,  k2  =  nv(ki)  =  halt.  We  have  71  final(si,  hi)  and 
72  — »  final(s2,  h2).  From  7!  R  72  and  the  definition  of  R  we  have  si  =y  s2,  which 
implies  that  final(si,  h  1)  R  final(s2,  h2). 

CASE  k\  =  goto  l :  In  this  case,  k2  =  nv(k\)  =  goto  /.  We  have  71  goto(/,  (si,  hi)) 
and  72  goto(/,  (s2,  h2)).  From  71  R  72  and  the  definition  of  R  we  have  s  1  —y  s2, 
which  implies  that  goto(/,  (si,  hi))  R  goto (/,  (s2,  h2)).  □ 

4.4.2  Combining  Projection  and  Instrumentation 

We  have  shown  that  a  program  is  simulated  by  any  of  its  instrumentations  and  that  an 
instrumentation  (or  any  other  program)  is  simulated  by  any  of  its  projections.  As  one  of 
our  goals  is  to  use  numeric  programs,  which  are  projections  of  instrumentations,  to  reason 
about  the  original  program,  we  need  to  obtain  a  result  relating  numeric  programs  to  the 
original  program.  Figure  4.13  summarizes  the  situation. 
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Notation  Guide 

Syntactic  Relationship 
\  Semantic  Relationship  / 

Related  Theorem 


Figure  4.13:  A  summary  of  the  current  state  of  the  technical  development. 

The  following  theorem  ties  the  two  endpoints  in  this  figure  together,  describing  the 
simulation  result  that  holds  of  projections  of  instrumentations. 

Theorem  24.  (Projections  of  Instrumentations)  IfT  h  P  ►y  P  and  P'  =  7 ry/(P)  and 
Qo  =  Y(initloc(P))  then 

(p  I  Qo))  <4  -  ,  (p’  I  Qo)) 

(vnv1) 

Proof.  The  result  follows  from  Theorem  22,  Theorem  23,  Theorem  18,  and  Theorem  13. 
By  Theorem  22  we  have  some  R  such  that  ((P  |  Qo))  =_  ((P  |  Qo))-  By  Theorem  23  we 
have  an  R'  such  that  ((P  |  Q0))  ±  ;  ((P7 1  Qo]).  Applying  Theorem  18  to  each  of  these 

yields 

IP  I  Qo))  <=^  IP  I  Qo)) 

and 

(P I  Qo))  <4V/  (P’  I  Qo)) 

Expanding  the  definitions  of  =p  and  =y  allows  us  to  verify  the  following. 

Vo,  b ,  c.  (. a  —y  b )  A  (6  =y/  c)  ( a  =yny,  c) 
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The  proof  is  by  case  analysis  on  a.  To  take  a  representative  case,  suppose  a  =  final(s,  h). 
Then  b  =  finals',  h)  with  s  =p  s'  and  c  =  final(s",  hi)  with  s'  —y>  s" .  We  must 
show  that  final(s,  h)  =vnv  final(s,/,  h').  This  is  the  case  if  we  can  show  s  =ynV,  s" . 
This  requires  showing  \/x.  (x  e  V  fl  V')  =>■  s(x)  =  s"(x).  If  x  e  V  fl  V'  then  x  G  V 
and  x  €  V' .  This  allows  us  to  use  our  assumptions  s  —y  s'  and  s'  —V  s"  to  conclude 
s(x)  =  s"(x). 

Theorem  13  then  combines  these  results,  giving  us 

((PlQoD^^iP'lQoD 

□ 

The  result  of  this  is  that  numeric  programs  preserve  LTSLP  properties  over  variables 
in  V  fl  V'.  In  practical  terms,  this  means  that,  provided  we  include  all  of  the  integer- valued 
variables  from  the  original  program  in  the  projection,  then  any  LTSLP  property  over  these 
original  integer  variables  can  be  checked  by  analyzing  P'. 


4.5  Example 

We  now  consider  an  example  that  shows  how  the  translation  to  numeric  programs  can  be 
used  to  check  program  properties  (and  also  how  choosing  the  wrong  numeric  program 
can  result  in  an  inability  to  prove  the  desired  property,  an  unsurprising  result  given  that 
numeric  programs  over- approximate  the  behavior  of  the  original  program). 

Figure  4.14  gives  a  program  that  traverses  a  circular  linked  list  rooted  at  x.  The  main 
loop  checks  whether  a;. next  =  x.  This  is  true  if  and  only  if  the  list  contains  only  one 
element.  If  the  list  has  more  than  one  element,  then  (x. next). data1  is  compared  to  v.  If 
it  is  less  than  or  equal  to  v,  then  the  list  cell  at  x.next  is  removed.  Otherwise,  v  is  set  to 
(x. next). data.  This  will  cause  the  cell  at  x. next. data  to  be  freed  during  the  next  iteration. 

1  We  use  C-style  multiple  dereference  for  clarity.  The  intermediate  variables  x' ,  y  and  t  are  used  in  Figure 
4.14  since  our  language  does  not  support  multiple  dereference,  nor  dereference  inside  of  expressions. 
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Lo  :  goto  Li 
Li  :  y  =  x.next; 

branch  y  =  x  halt, 

y  x  =A  x'  :=  .x.next; 

t  :=  a/. data; 
goto  l_2 
end 

L2  :  branch  t  <  v  =>  x.next  :=  a/. next; 

free  x  ; 
goto  Li, 

t  >  v  =>■  v  :=  x'.data; 
goto  Li 
end 

Figure  4.14:  An  example  program  that  traverses  a  circular  linked  list,  conditionally  freeing  ele¬ 
ments. 


In  order  to  show  that  this  program  terminates,  we  will  produce  an  instrumentation  that 
tracks  the  following  two  instrumentation  variables. 

n  the  size  of  the  linked  list  at  x 
z  the  value  present  at  (x. next). data 

We  will  use  the  following  inductive  definition  to  represent  the  circular  linked  list. 

Is (n,  first,  next )  = 

(emp  A  first  =  next  A  n  —  0) 

V  (=h,  d.  ( first  1 — y  [next  :  z,  data  :  d\)  *  ls(n  —  1,  z,  next)) 

First,  we  present  an  instrumentation  tracking  only  n,  the  size  of  the  linked  list.  The  left 
half  of  Figure  4.15  presents  the  instrumented  program.  We  consider  executions  starting 
from  the  precondition  3 n.  ls(n ,  x,  x)  A  n  >  1  indicating  that  there  is  a  non-empty  circular 
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linked  list  at  x.  We  underline  the  instrumentation  commands  in  order  to  make  it  more 
clear  which  commands  were  added.  The  first  instrumentation  command  n  :=  ?  allows  us 
to  remove  the  quantifier  on  n  from  the  precondition  and  reason  from  r(l_!)  (displayed 
at  the  bottom  of  Figure  4.15).  The  removal  of  an  element  from  the  list  corresponds  to 
a  decrease  of  n  by  1.  The  command  assume(n  =  1)  records  a  pure  consequence  of  the 
branch  condition  y  =  x.  As  y  is  a;,  next,  we  have  y  =  x  exactly  when  the  list  contains  a 
single  cell. 

The  right  half  of  Figure  4.15  gives  the  numeric  program  obtained  by  projecting  the 
instrumented  program  onto  the  singleton  set  {n}.  The  branches  from  the  original  program 
become  non-deterministic  branches  and  we  are  left  with  only  the  assume  commands  in¬ 
volving  n  and  the  update  to  n  in  the  first  branch  of  the  continuation  at  L2.  This  program  is 
not  a  sufficiently  precise  abstraction  to  enable  us  to  show  termination.  While  we  are  able 
to  model  the  fact  that  n  is  decreasing,  we  cannot  show  that  the  branch  which  decreases  n 
is  taken  infinitely  often.  It  could,  for  example,  be  the  case  that  the  second  branch  of  the 
continuation  at  L2  is  always  taken.  While  it  is  not  sufficient  for  termination,  this  numeric 
program  does  allow  us  to  prove  some  non-trivial  properties.  For  example,  we  can  show 
that  n  is  non-increasing,  represented  by  the  following  LTSLP  formula. 

G((af/oc(Li)  A  n  =  no)  ID  G(at/oc(l_i)  3n<  no)) 

Note  the  use  of  the  ghost  variable  n0  to  capture  the  current  value  of  n.  Since  n0  does  not 
appear  in  the  program,  its  value  is  never  changed.  Since  the  precondition  does  not  mention 
no,  it  can  have  any  value  in  the  initial  state.  This  ensures  that  there  are  traces  for  which  the 
antecedent  o^oc(Li)  An  =  n0  is  true.  The  use  of  implication  then  confines  our  attention 
to  those  traces  when  evaluating  the  rest  of  the  formula. 

We  now  move  on  to  an  instrumented  version  of  the  program  that  also  tracks  z,  the 
current  contents  of  x. next. data.  The  left  half  of  Figure  4.16  gives  the  instrumented  version 
of  the  program  and  the  right  half  of  the  same  figure  contains  the  numeric  program  obtained 
by  projecting  this  instrumented  program  onto  the  set  of  variables  { n ,  z,  v}.  This  program 
can  be  shown  to  terminate  since  the  existence  of  z  enables  us  to  track  the  contents  of 
x.  next. data  across  iterations  of  the  loop  at  location  l_i.  Specifically,  we  can  now  show  that 
in  the  numeric  program,  the  second  case  of  the  branch  at  L2  cannot  occur  infinitely  often. 
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Instrumented  Program 

L0  :  n  :=  ?; 
goto  Li 

Li  :  y  =  x.next; 

branch  y  =  x  =>-  assume(?r  =  1); 
halt, 

y  /  x  =>•  assume(n  >  1) ; 

:=  x.next; 
t  :=  x'.data; 
goto  L2 
end 

L2  :  branch  t  <  v  =>  x.next  :=  x/.next; 

free  x/  ; 
n  :=  n  —  1; 
goto  Li, 

t  >  v  =>■  v  :=  x'.data; 
goto  Li 
end 


Numeric  Program 

L0  :  n  :=  ?; 

goto  Li 

Li  :  branch  true  =4*  assume(n  =  1); 
halt, 

true  =>-  assume(n  >  1); 
goto  L2 
end 

L2  :  branch  true  =4>  n  :=  n  —  1 ; 

goto  Li, 
true  =>-  goto  Li 
end 


r(L0)  =  3n.  k(n,  x,  x)  A  n  >  1 
T(Li)  =  ls(n,x,x)  An  >  1 

r(L2)  =  3a,  6.  (x  eA  [next  :  x',  data  :  a]  *  x'  i-A  [next  :  6,  data  :  i] 

*  ls(n  —  2,  b,  x))  A  n  >  1 


Figure  4.15:  An  instrumented  version  of  the  program  in  Figure  4.14  and  the  corresponding  projec¬ 
tion  onto  the  set  {n}. 
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The  reason  is  that  executing  this  branch  sets  v  to  z,  which  then  prevents  the  assume)/;  >  v) 
statement  from  being  satisfied  the  next  time  L2  is  reached,  forcing  execution  to  proceed 
along  the  first  case  of  the  branch.  Thus,  at  least  every  other  iteration  of  the  loop  at  L2 
results  in  n  decreasing  by  1.  If  n  is  initially  greater  than  or  equal  to  1  (a  situation  which 
the  assume  statements  at  Lx  force),  then  eventually  n  will  be  equal  to  1  and  the  program 
will  halt. 

Finally,  we  consider  a  liveness  property  other  than  termination.  Consider  the  numeric 
program  in  Figure  4.17.  This  is  the  same  program  that  was  on  the  right  side  of  Figure  4.16, 
but  with  the  two  cases  of  the  branch  at  L2  split  into  their  own  continuations.  This  allows 
us  to  write  LTSL  formulae  that  specify  which  branch  is  taken. 

One  example  of  such  a  formula  is  the  following,  which  states  that  it  is  always  the  case 
that  after  an  execution  visits  label  L4,  it  eventually  visits  label  L3. 

G(at/oc(L4)  3  F(at/oc(L3))) 

If  L4  were  associated  with  a  request  and  L3  with  a  response,  then  this  formula  would  state 
that  every  request  is  eventually  responded  to. 

Note  that  all  of  the  properties  we  have  considered  are  universal  in  that  they  hold  if 
and  only  if  they  hold  of  all  program  traces.  This  is  the  nature  of  LTSL  formulae.  We 
cannot  write  statements  in  LTSL  that  describe  existential  path  properties.  An  example  of 
such  a  property  is  “there  are  traces  in  which  n  >  1  is  true  at  Li  but  L4  is  never  visited.” 
Since  numeric  programs  are  over-approximations  of  the  original  program,  such  existential 
properties  are  not  necessarily  preserved  (it  is  possible  that  such  a  property  could  hold  of 
the  numeric  program  but  not  hold  of  the  original  program). 


4.6  Summary 

We  now  summarize  what  we  have  accomplished  in  this  chapter,  collecting  and  combining 
the  various  theorems  into  their  most  useful  forms.  We  first  showed  how  to  associate  an  in¬ 
strumented  program  with  an  original  program.  We  can  reason  about  the  safety  and  liveness 
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Instrumented  Program 

L0  :  n  :=  ?;  z  :=  ?;  goto  Li 
Li  :  ?/  =  aznext; 

branch  y  =  x  ^  assume(n  =  1); 


y^x 


L2  :  branch  t  <  v 


halt, 

>  assume(ro  >  1); 
x'  :=  re. next; 
t  :=  a/. data; 
goto  L2  end 
assume(z  <  v) ; 
re. next  :=  x/.next; 
free  x'  ; 
n  :=  n  —  1; 
z  :=  ? ; 
goto  Li, 

t  >  v  assume(z  >  w) ; 
x  :=  x'.data; 
assume(x  =  z) ; 
goto  l_i  end 


r(L0 

r(U 


Numeric  Program 

Lo  :  n  ?;  z  :=  ?;  goto  Li 
Li  :  branch  true  =>  assume(n  =  1); 
halt, 

true  =>-  assume(n  >  1); 


goto  L2 


end 


l_2  :  branch  true  =>-  assume(z  <  v)  ; 

n  :=  n  —  1  ; 


z  :=  ?; 


goto  Li, 

true  =>-  assume(z  >  v)  ; 


v  :=?} 


assume(x  =  z)  ; 
goto  Li 


end 


=  3n.  ls(n ,  x,  x)  A  n  >  1 

=  (3a,  b,  d.  x  !->•  [next  :  a,  data  :  d]  *  a  i->-  [next  :  b,  data  :  z]  *  ls{n  —  2,  b,  x)) 

V  (x  1 — y  [next  :  x,  data  :  z]  A  n  =  l) 

r(L2)  =  3a,  b,  d.  (x  i-A  [next  :  x' .  data  \  d}*  x'  ^  [next  :  b,  data  :  z]  *  ls(n  —  2,  6,  x)) 
A  z  =  t 


Figure  4. 16:  An  instrumentation  and  projection  of  the  program  in  Figure  4. 14,  with  instrumentation 
variables  n  and  z  and  projection  variables  n,  z,  v. 
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L0  :  n  :=  ?;  z  :=  ? ;  goto  Li 
Li  :  branch  true  =>-  assume(n  =  1); 
halt, 

true  =>•  assume(n  >  1); 
goto  L2 
end 

L2  :  branch  true  =4*  goto  L3 
true  =>-  goto  L4 
end 

L3  :  assumeQ  <  v) ; 

n  :=  n  —  1 ; 


goto  Li, 

L4  :  assume^  >  u); 


assume(t>  =  z) ; 


goto  Li 


Figure  4. 17:  The  numeric  program  from  Figure  4. 16,  but  rearranged  so  that  the  cases  of  the  second 
branch  are  split  into  separate  continuations. 

behavior  of  the  instrumented  program  and  the  properties  satisfied  by  the  instrumentation 
can  be  converted  into  properties  that  are  satisfied  by  the  original  program. 

Theorem  25.  Let  Qo  =  T(initloc(P)).  If  T  h  P  ►y  P  and  <f>  G  LTSL  then 
{(P  I  Qo))  1=  0  implies  (( P  \  Q0))  |=  f). 

Proof.  This  theorem  is  the  result  of  combining  Theorem  22,  Theorem  18,  Corollary  2,  and 
Lemma  11.  By  Theorem  22  we  have 

((P|Q0))£Ryr=_  ({P  |  Qo)) 
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From  Theorem  1 8  we  then  have 

traces(([P  \  Q0)))  <=_  traces(((P  |  Q0))) 

If  we  let  V'  =  fv(ft)  —  V,  then  Corollary  2  gives  us 

((f>|Q„)) 

To  complete  the  proof  we  need  only  show  that  V'  C  V  and  apply  Lemma  11.  To  show 

this,  suppose  that  x  E  V' .  Then  x  E  fv(ft)  and  x  ft  V.  This  last  fact  implies  x  E  V  (since 

V  is  the  complement  of  V).  This  establishes  V'  C  V.  □ 

Instrumented  programs  let  us  introduce  additional  variables  and  commands  and  use 
these  to  prove  properties  of  the  original  program.  However,  we  will  usually  want  to  de¬ 
compose  the  verification  problem  further,  using  projection  to  obtain  a  program  that  only 
involves  integer- valued  variables  and  then  passing  this  program  to  an  external  verification 
tool.  The  following  theorem  states  what  we  can  conclude  about  the  original  program  if  we 
use  such  a  method. 

Theorem  26.  Let  Q0  =  r(initloc(P)).  If  the  following  hold 

1.  T  F  P  ►  y  P  and  0  E  LTSL  and  ((P  |  Qo))  |=  0 

2.  P'  =  TTy(P)  and  0'  G  LTSLP(V')  and  {{P'  \  Q0))  b  ^ 

then  (P  |  Q0))  -!l(V'.o  Ac/). 

Proof  This  theorem  is  primarily  a  combination  of  Theorem  23  and  Theorem  25.  Suppose 
condition  2  holds.  Then  by  Theorem  23  we  have  that  there  is  some  relation  R'  such  that 
((P  |  Qo))  ^  ,  (( P ’  I  Qo))-  By  Theorem  18  we  have  ((P  \  Q0))  <±  {{P’  \  Q0 )).  By  The- 

orem  16  we  have  that  ft  is  =y/-invariant.  Then  by  Corollary  1  we  have  that  ((P1  \  Q0 ))  |=  ft 
(which  we  have)  implies  ((P  |  Q0 ))  (=  ft.  Since  we  also  have  ((P  |  Q0 ))  |=  0,  we  have 
((P  |  Qo))  |=  0  A  ft .  This  holds  since  for  any  trace  T  in  traces((P  \  Qo)),  we  have  T  ft  o 
and  T  |=  ft,  which  according  to  the  semantics  of  LTSL  implies  that  T  |=  0  A  ft. 

Finally,  we  note  that  0  A  ft  is  an  LTSL  formula  and  thus  Theorem  25  applied  to 

((P  |  Qo))  h  0  A  ft  and  T  h  P  ►y  P  gives  us  ((P  |  Q0))  \=  [¥](!/,  0  A  ft).  □ 


185 


4  Instrumented  Programs 


4.7  Conclusion 


The  instrumentation  analysis  given  in  the  next  section  gives  a  method  of  automatically 
generating  instrumented  programs  and  thus  numeric  abstractions.  But  there  are  likely  to 
be  other  approaches  to  instrumentation  analysis  that  differ  in  their  efficiency,  complete¬ 
ness,  and  generality.  Thus,  one  of  the  primary  technical  contributions  of  this  thesis  is  that 
the  rules  given  for  checking  ThP  ►  y-  P  are  sufficient  to  ensure  that  7 w(P)  simulates  P. 
This  gives  a  well-defined  target  for  analyses  that  produce  numeric  abstractions  of  programs 
in  much  the  same  way  that  partial  correctness  proofs  in  Hoare  logic  provide  a  common  tar¬ 
get  for  safety  analyses.  In  fact,  the  process  of  generating  an  instrumented  program  can  be 
viewed  as  a  generalization  of  the  process  of  proving  partial  correctness.  The  invariants  T 
that  are  required  are  valid  partial  correctness  invariants,  but  the  proving  process  is  relaxed 
in  the  sense  that,  rather  than  only  working  with  invariants,  we  are  allowed  to  also  insert 
instrumentation  commands. 

In  this  sense,  the  process  is  similar  to  program  proving  in  Hoare  logic  with  auxiliary 
variables,  for  example  as  described  in  [Owicki  and  Gries,  1976].  A  major  difference  is  due 
to  the  handling  of  non-determinism.  Our  Inst-Exists  rule  lets  us  insert  a  command  x  :=  ? 
when  we  have  the  precondition  3x.  0  in  order  to  reason  from  0.  And  our  Inst-Disj  rule 
lets  us  insert  branch  true  . . . ,  true  . . .  end  when  we  have  the  precondition  Qi  V  Q2 
in  order  to  reason  separately  from  Q 1  and  Q2.  Such  operations  are  not  allowed  in  standard 
Hoare  logic  with  auxiliary  variables.  The  reason  the  two  methods  differ  is  that  we  are 
interested  in  properties  preserved  by  simulation,  which  requires  the  existence  of  some 
transition  with  a  given  property,  whereas  Hoare  logic  for  partial  correctness  is  interested 
in  properties  that  hold  for  all  transitions.  Another  reason  for  the  difference  is  that  we  are 
only  translating  one  program  to  another,  whereas  Hoare  logic  is  concerned  with  proving 
properties  of  programs.  Once  we  have  added  the  new  commands  to  the  program  and  turn 
our  attention  to  the  problem  of  proving  program  properties,  we  switch  to  a  universal  view 
of  transitions,  checking  that  a  property  holds  of  all  paths. 

One  contribution  of  the  approach  we  have  taken  in  this  chapter  is  the  careful  separa¬ 
tion  of  the  addition  of  auxiliary  /  instrumentation  variables  from  the  process  of  proving 
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program  properties.  Once  we  start  down  this  path,  we  see  that  the  traditional  restrictions 
on  auxiliary  variables  are  overly  harsh.  By  relaxing  these,  we  obtain  rules  that  exhibit  a 
novel  correspondence  between  existential  variables  and  non-deterministic  assignment  and 
between  disjunction  and  non-deterministic  choice. 
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Chapter  5 

Instrumentation  Analysis 


In  this  chapter,  we  present  an  automated  algorithm  for  generating  instrumented  programs 
of  the  form  given  in  Chapter  4.  We  call  such  an  automated  procedure  an  instrumentation 
analysis.  The  algorithm  proceeds  by  performing  a  shape  analysis  on  the  program,  which 
enables  it  to  discover  an  appropriate  mapping  T  for  the  proof  that  T  b  P  ►y  P.  During 
the  analysis  process,  the  algorithm  also  inserts  instrumentation  commands  at  certain  points 
in  order  to  record  information  about  numeric  properties.  The  syntax-directed  projection 
operation  presented  in  Section  4.4  can  then  be  used  to  generate  a  numeric  program  from  the 
instrumented  program  produced  by  the  instrumentation  analysis.  We  have  implemented 
this  algorithm  in  a  tool  called  Thor  [Magill  et  al.,  2008],  which  is  able  to  generate  numeric 
abstractions  of  C  programs  using  the  techniques  described  in  this  thesis. 

The  portion  of  the  analysis  that  is  concerned  with  the  generation  of  T  can  be  described 
as  an  abstract  interpretation  [Cousot  and  Cousot,  1977]  where  the  abstract  domain  con¬ 
sists  of  separation  logic  formulae  of  a  restricted  form.  However,  familiarity  with  abstract 
interpretation  will  not  be  required  in  order  to  understand  the  presentation  of  the  algo¬ 
rithm  that  we  provide  here.  While  we  will  use  some  terms  from  the  abstract  interpretation 
framework,  we  will  describe  the  algorithm  in  terms  of  our  goal  of  generating  instrumented 
programs  according  to  the  rules  in  Chapter  4.  For  a  description  of  this  style  of  shape 
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Inductive  Predicates  d  G 
Records  p  ::  = 
Spatial  Predicates  ~  ::  = 

Spatial  Formulae  X  ::  = 
Pure  Formulae  II  ::  = 
Symbolic  State  Formulae  (<3?)  <p  ::= 


V 

e\fT:eT,p 

emp  I  ea  \p]  |  d(e) 

H  |  X  *  X 

true  |  false  |  ef  =  ef  |  e\  <  e\  |  “'ll  |  IIi  A  II2 
3x.  X  A  n 


Figure  5.1:  Restricted  subset  of  separation  logic  formulae.  The  notation  x  indicates  a  list  of  vari¬ 
ables  xi,  X2-,  ■  ■  ■ ,  xn  and  3x.  Q  is  shorthand  for  3x\3x2 ■  ■  ■  ■  3xn.  Q. 


analysis  in  abstract  interpretation  terms,  see  [Distefano  et  al.,  2006]  and  [Berdine  et  al., 
2007], 

We  begin  our  discussion  by  describing  the  restricted  form  of  separation  logic  formulae 
used  by  the  automated  analysis. 


5.1  Symbolic  State  Formulae 

Figure  5.1  gives  the  restricted  set  of  separation  logic  formulae  used  in  the  automated  anal¬ 
ysis.  Working  in  this  subset  simplifies  the  theorem  proving  problem  that  we  discuss  in 
Section  5.5  and  also  results  in  simple  predicate  transformers  for  the  commands  in  our  lan¬ 
guage.  We  write  x  to  represent  a  list  of  variables  xi,  x2,  ■  ■  ■ ,  xn.  We  will  implicitly  convert 
these  ordered  lists  into  unordered  sets  as  needed  when  stating  certain  properties.  Such  con¬ 
versions  will  be  obvious  due  to  the  set  notation  used.  For  example,  x  Uy  represents  the  set 
consisting  of  the  elements  of  x  together  with  those  in  y.  The  notation  y  G  x  indicates  that 
y  is  a  member  of  the  set  consisting  of  the  elements  of  x. 

We  would  like  to  identify  formulae  that  are  logically  equivalent.  However,  logical 
equivalence  of  separation  logic  formulae  cannot  always  be  accurately  determined.1  For 

'The  undecidability  of  separation  logic  formulae,  as  we  have  defined  them,  follows  from  the  fact  that 
they  contain  the  integers  with  addition,  multiplication,  and  existential  quantification  as  a  fragment  of  the 
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E  *  emp  =  E 


— - - - — - -  r  ,,  n - — —  W  n)) 

x,  X2-  E  A  II  =  3x’i,  x  ,  X2 ■  E[af /x]  A  F[x  /x\ 


3x1 ,  x.  x' ,  X2-  E  A  II  =  3x1,  x' ,  x,X2-  E  A  II  Ei  *  E2  =  E2  *  Ei 


Ei  =  E2 


Si  *  (E2  *  E3)  =  (Ei  *  S2)  *  E3  3x.  Si  All  =  3x.  S2  A  II 


Figure  5.2:  Equivalence  relation  for  symbolic  state  formulae. 

this  reason,  our  implementation  may  distinguish  some  formulae  that  are  actually  equiv¬ 
alent.  This  does  not  affect  soundness  of  the  approach,  but  can  affect  completeness.  We 
assume  that  the  implemented  equivalence  check  at  least  identifies  formulae  that  are  re¬ 
lated  by  the  equivalence  relation  given  in  Figure  5.2.  This  considers  formulae  equivalent 
up  to  commutativity  and  associativity  of  *,  the  unit  law  for  emp,  renaming  of  quantified 
variables,  and  re-ordering  of  existential  quantifiers. 

The  set  <f>  is  closed  with  respect  to  *  in  the  sense  that  the  ^-conjunction  p  *  p'  of 
elements  of  <I>  is  semantically  equivalent  to  an  element  p"  e  <I>  (according  to  the  semantics 
given  in  Figure  2.7).  The  element  p"  is  defined  as  follows.  Let  p  =  3v.  E  A  II  and 
p1  =  3v'.  S'  A  IT  such  that  fv(E  A II)  D  v '  —  0  and  fv(Y!  A  IT)  fl  v  —  0  (these  constraints 
can  always  be  satisfied  by  renaming  quantified  variables).  Then  we  have  the  following 

p  *  p'  3v,  v1 .  (E  *  X/)  A  (n  A  n7) 


and  this  is  in  <I>. 

Similarly,  $  is  closed  with  respect  to  conjunction  of  pure  formulae  (for  all  p  e 
there  is  a  p'  e  $  such  that  (q?  A  IT)  <^>  p').  These  operations  will  be  used  freely  with  the 

logic.  Decidability  of  this  fragment  is  Hilbert’s  10th  problem  and  was  shown  to  be  undecidable  by  Davis, 
Matiyasevich,  Putnam,  and  Robinson.  Decidability  of  fragments  of  the  logic  not  including  multiplication  has 
been  explored  to  some  extent  by  [Berdine  et  al.,  2004]  and  [Bozga  et  al.,  2008],  but  much  is  still  unknown. 
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understanding  that  they  refer  not  to  a  general  separation  logic  formula  that  falls  outside  of 
$,  but  rather  to  the  element  of  <f>  semantically  equivalent  to  that  formula. 


5.2  Inductive  Predicate  Specifications 

In  order  to  reason  about  data  structures,  our  tool  incorporates  support  for  inductive  pred¬ 
icate  specifications.  We  use  the  term  “specification”  rather  than  “definition”  deliberately, 
as  these  specifications  differ  from  definitions  in  two  key  ways. 

First,  the  syntax  for  specifications  adds  additional  structure  beyond  that  present  in  def¬ 
initions.  This  structure  serves  to  separate  the  instrumentation  variables  from  the  program 
variables  in  a  way  that  simplifies  automatic  reasoning. 

Secondly,  we  allow  multiple  specifications  for  the  same  predicate  name,  whereas  only 
a  single  definition  for  each  name  was  permitted  in  Section  2.2.2.  This  allows  inductive 
consequences  of  definitions  to  be  provided  to  the  tool.  Such  consequences  cannot  be 
inferred  by  the  tool,  as  the  automated  analysis  does  not  perform  inductive  reasoning.  Al¬ 
lowing  multiple  specifications  for  the  same  predicate  has  implications  for  the  semantics  of 
specifications,  and  we  will  formally  connect  this  semantics  to  the  semantics  of  definitions 
given  previously.  One  consequence  of  this  decision  to  allow  multiple  specifications  is  that 
it  provides  opportunity  for  the  user  to  introduce  inconsistency  into  the  system.  We  address 
this  concern  with  Theorem  27  on  page  198. 

Syntax 

The  syntax  for  inductive  specifications  is  given  in  Figure  5.3.  A  predicate  specification 
has  the  following  form. 


d(x-,y)  <=>  Ci(f;y)  I  ■  ■  ■  I  Cn(x]y) 

The  variable  d  is  the  name  of  the  inductive  predicate  we  are  specifying.  The  vari¬ 
ables  to  the  left  of  the  semicolon,  x,  are  referred  to  as  instrumentation  parameters.  These 
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Predicate  Names  d  £  V 

Inductive  Specification  Sd  ::=  d(x-,y)  <=>  C\(x]  y)  ‘|’  ...  ‘|’  Cn(x:  y) 

Case  C(x;  y)  ::=  II  :  let  5  satisfy  II'  in 

where  fv(H)  C  x  and  fv(U')  C  (|U|) 

and  fv(ip)  C  (y  U  z)  and  x,  y.  z  distinct  and  disjoint 

Figure  5.3:  Syntax  of  inductive  specifications  as  implemented  in  Thor.  The  notation  ‘|’  is  used 
to  indicate  the  literal  character  |,  and  distinguish  it  from  the  BNF  grammar  operator  consisting  of 
the  same  symbol. 


parameters  represent  integer-valued  quantities  that  we  want  our  analysis  to  track  with  in¬ 
strumentation  variables — for  example,  the  length  of  a  list  or  the  height  of  a  tree.  We  will 
underline  instrumentation  parameters  to  help  the  reader  identify  them.  The  C%  are  cases  of 
the  definition  and  have  the  following  form. 


II  :  let  z_  satisfy  IT  in  p 


The  pure  condition  II  is  a  constraint  on  the  instrumentation  parameters  x  which  gives 
the  condition  that  differentiates  this  case  from  the  others.  Often  the  11,  in  the  cases  of  a 
definition  will  be  non-overlapping  in  the  sense  that  for  any  i ,  j  we  have  If,  A  11,  =>•  false. 
For  example,  in  the  definition  of  a  list  of  length  n,  we  might  have  n  —  0  and  n  >  0  as 
our  two  conditions.  However,  this  disjointness  of  conditions  is  not  a  requirement.  For 
example,  a  list  predicate  that  does  not  track  list  length  would  simply  have  true  for  the 
condition  in  both  the  base  case  and  the  inductive  case. 

Before  explaining  the  rest  of  the  syntax,  it  is  helpful  to  consider  a  concrete  example. 
Figure  5.4  shows  a  graphical  depiction  of  a  doubly-linked  list  segment.  The  inductive 
specification  for  this  segment  is  given  below.  The  syntax  [  ]  represents  an  empty  list. 
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first 

last 

next 

next 

n 

prev 

prev 

V 

Figure  5.4:  Graphical  depiction  of  the  doubly-linked  list  segment  predicate. 

dll ( k ;  P,  first,  last,  n)  <=> 

k  —  0  :  let  []  satisfy  true  in  emp  A  first  =  n  A  last  =  p 
|  k  >  0  :  let  k'  satisfy  k  =  kf  +  1  in 

3z.  ( first  (->•  [prev  :  p ,  next  :  z\)  *  dll (kf',  first,  z,  last,  n )) 

The  parameters  first  and  last  are  the  addresses  of  the  first  and  last  cells  in  the  list 
segment.  The  parameter  p  is  the  contents  of  the  prev  field  of  the  first  element  and  the  n 
parameter  is  the  address  value  contained  in  the  next  field  of  the  last  element  of  the  segment. 
The  parameter  k  is  the  length  of  the  list. 

The  specification  can  be  read  as  saying  that  there  are  two  possible  cases  for  a  list 
segment  with  length  k.  Either  k  =  0,  in  which  case  the  list  is  empty,  or  A;  >  0,  in  which 
case  the  list  is  non-empty. 

In  the  non-empty  case,  the  sub-formula 

3z.  ( first  (->•  [prev  :  p,  next  :  z])  *  dll ( lfi ]  first,  z,  last,  n )) 

indicates  that  the  list  can  be  split  into  the  head  element,  given  by  the  formula 
first  i — [prev  :  p,  next  :  z]  and  the  tail  of  the  list,  given  by  dll (kS;  first,  z,  last,  n ).  This 
tail  portion  of  the  list  has  length  k'.  The  rest  of  this  case  of  the  specification  is  concerned 
with  relating  k  (the  length  of  the  full  list  segment)  and  tfi  (the  length  of  the  sub-segment). 

After  the  keyword  “let,”  a  list  of  variables  can  appear.  These  are  the  variables  that 
appear  as  instrumentation  parameters  in  recursive  instances  of  inductive  predicates  in  the 
body  of  the  case.  Returning  to  our  general  syntax,  reproduced  below, 

C(x]  y)  II  :  let  z_  satisfy  II7  in  ip 
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the  list  z  gives  the  variables  that  will  be  passed  as  instrumentation  parameters  to  inductive 
predicates  appearing  in  p .  The  formula  II'  then  relates  z  to  the  instrumentation  parameters 
for  the  predicate  being  specified,  which  are  given  by  x.  In  our  doubly-linked  list  example, 
IT  for  the  non-empty  case  is  k  =  +  1.  Since  the  empty  case  contains  no  instances  of 

inductive  predicates,  the  list  of  variables  in  that  case  in  empty.  This  is  the  role  of  the  [  j 
syntax — it  represents  an  empty  list. 

To  summarize,  new  variables  will  be  added  by  our  instrumentation  analysis  and  used 
to  track  quantities  like  the  length  of  a  list  or  the  size  of  a  tree.  The  specification  of  an 
inductive  predicate  gives  a  list  of  possible  expansions.  Each  expansion  may  expose  sub¬ 
structures  which  themselves  have  quantities  to  be  tracked.  The  list  z_  contains  the  variables 
representing  these  new  quantities  and  each  II'  gives  a  relation  between  the  variables  in  x 
(the  sizes  passed  into  this  predicate  instance)  and  those  in  z_  (the  sizes  passed  to  recursive 
instances  of  the  predicate).  This  relation  is  represented  as  an  expression  over  variables  in 
x  U  z. 

Syntactic  Connection  with  Inductive  Definitions 

Individual  specifications  are  very  closely  related  to  individual  inductive  definitions.  In 
fact,  they  differ  only  in  syntax.  Consider  the  specification  below. 

d(x;y)  <=>  Ci  (x;  y)  j  ...  |  Cn(x;y) 

Let  (Ci)  be  defined  such  that  if  Ci  is  II  :  let  z_  satisfy  II'  in  p,  then  (C))  =  IlA3i^.  (II'Ay?). 
Then  the  specification  above  corresponds  to  the  definition  below. 

d(x,y)  =  (Ci  (£;(/))  |  . . .  |  (<?„(£;  y)) 

We  will  write  ( S )  to  denote  the  translation  of  specification  S  to  the  syntax  for  definitions. 
We  also  generalize  this  to  sets  of  specifications.  Let  S  =  {^i, . . . ,  be  a  set  of  induc¬ 
tive  specifications.  Then  (S)  =  (Si)  ::  ...  ::  (Sn)  (where  ::  separates  the  elements  in  a 
list  of  inductive  definitions  as  used  in  Section  2.2.2).  Note  that  while  the  translation  of  a 
single  specification  is  always  a  well-formed  definition,  the  translation  of  a  set  of  specifica- 
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tions  will  not  be  a  valid  list  of  definitions  if  there  are  multiple  specifications  for  the  same 
predicate  name. 

Multiple  Specifications 

Note  that  the  specification  of  a  doubly-linked  list  segment  given  previously  is  “front- 
biased,”  in  that  the  heap  cell  exposed  in  the  inductive  case  is  at  the  front  of  the  list.  As 
we  will  see  when  we  describe  our  instrumentation  algorithm,  this  will  result  in  the  spec¬ 
ification  being  useless  for  exposing  cells  at  the  back  of  the  list,  which  is  often  necessary. 
Multiple  specifications  solve  this  problem  by  providing  multiple  ways  of  viewing  a  data 
structure.  These  various  views  are  then  all  available  for  use  during  the  analysis.  An  exam¬ 
ple  of  a  specification  for  accessing  a  doubly-linked  list  from  the  back  is  given  below. 

dll ( k ;  p,  first,  last,  n)  <=> 

k  —  0  :  let  [  ]  satisfy  true  in  emp  A  first  =  n  A  last  =  p 
|  k  >  0  :  let  fc'  satisfy  k  =  Jt  +  1  in 

3z.  dll  (A/;  p,  first,  z,  last )  *  ( last  (->•  [prev  :  z,  next  :  n]) 

Unlike  the  previous  specification,  here  the  inductive  case  involves  exposing  the  points- 
to  predicate  at  the  end  of  the  list  segment.  These  specifications  are  equivalent  in  the  sense 
that,  if  they  are  taken  as  definitions,  they  define  the  same  set  of  structures.  In  fact,  we  can 
use  induction  on  the  length  of  the  list  to  show  that  each  definition  implies  the  other. 

However,  it  does  not  have  to  be  the  case  that  all  specifications  of  a  given  predicate 
are  equivalent.  Consider  the  specification  below,  which  lets  us  view  a  list  segment  as 
consisting  of  two  sub-segments. 

dll (k;  p,  first,  last,  n)  <=> 

true  :  let  kx,  k2  satisfy  k  =  k1+k2in 

3x,  y.  dllfTq;  p,  first,  x,  y )  *  dll(fc2; x,  V,  last,  n ) 

This  specification  is  not  equivalent  to  either  of  the  other  two.  In  fact,  taken  on  its 
own  as  a  definition,  it  has  multiple  fixed-points,  the  least  of  which  is  the  empty  set  of 
heaps — clearly  not  the  same  set  defined  by  the  other  specifications. 
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However,  the  specification  above  is  compatible  with  the  others  in  the  sense  that,  if  we 
take  the  forward  or  backward-oriented  specification  as  our  definition  of  dll,  then  the  speci¬ 
fication  above  can  be  proved  valid.  Informally  speaking  (since  we  have  not  yet  defined  the 
semantics  of  specifications),  we  have  that  the  forward  and  backward  specifications  imply 
the  splitting  specification  above,  but  neither  of  the  reverse  implications  hold.  In  Theorem 
27  we  formalize  this  idea  of  using  some  subset  of  the  specifications  to  justify  the  others. 

Semantics 

In  Definition  6,  we  gave  the  semantics  of  a  set  of  inductive  definitions.  Inductive  definition 
sets  have  the  restriction  that  each  predicate  symbol  must  appear  at  most  once  on  the  left- 
hand  side  of  a  definition.  We  have  no  such  restriction  for  specifications.  In  fact,  a  primary 
reason  we  introduce  specifications  is  so  that  we  can  provide  multiple  specifications  for 
a  single  predicate  symbol.  As  such,  the  method  of  specifying  semantics  developed  in 
Theorem  8  is  more  appropriate  here,  as  it  is  straightforward  to  generalize  characteristic 
formulae  (Definition  10)  in  order  to  reduce  the  restrictions  on  where  predicate  symbols 
may  occur. 

When  we  are  provided  with  multiple  specifications  for  a  single  predicate  symbol,  we 
require  that  they  all  hold.  The  meaning  of  a  single  specification  S  is  given  by  the  charac¬ 
teristic  formula  (Definition  10)  associated  with  the  translation  of  S  to  a  definition.  This  is 
given  by  [(5')].  The  meaning  of  multiple  specifications  is  then  the  conjunction  of  these 
formulas  Ases  \  A’)]  >  which  we  abbreviate  as  |~S] .  Formally,  we  have  the  following. 

Definition  33.  Let  S  be  a  set  of  specifications  and  let  darn  (S  ')  give  the  set  of  predicate 
names  appearing  on  the  left-hand  side  of  “  <=>  ”  in  specifications  in  S.  A  store,  heap 
pair  s,  h  satisfy  separation  logic  formula  Q  given  S,  written  ( s ,  h)  |=s  Q,  if  and  only  if 
( s,h )  \=x  Q  for  all  X  G  Arfom(S)  such  that  \=x  f S~| . 

When  each  predicate  name  in  dom( S)  appears  to  the  left  of  <=>  in  at  most  one  spec¬ 
ification,  then  each  predicate  name  is  defined  at  most  once  by  (S)  and  so  (S)  is  a  valid 
list  of  definitions.  In  this  case,  our  definition  of  satisfaction  for  specifications  (Definition 
33)  coincides  with  our  definition  of  satisfaction  for  definitions  (Definition  6)  and  we  have 
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(s,  h )  [=s  Q  if  and  only  if  (s,  h )  |=^  Q.  This  follows  immediately  from  Definition  33 
and  Theorem  8. 

Even  when  we  have  multiple  specifications,  we  can  still  relate  Definition  33  to  Defi¬ 
nition  6  by  taking  some  subset  of  the  specifications  as  predicate  definitions  and  showing 
that  these  definitions  imply  the  remaining  specifications,  as  demonstrated  by  the  follow¬ 
ing  theorem.  Of  course,  even  when  the  theorem  below  does  not  apply,  the  semantics  of 
specifications  are  still  well-defined  by  Definition  33. 

Theorem  27.  Consider  a  set  of  specifications  S  and  a  subset  S'  C  S  such  that  (S') 
is  a  vcdid  set  of  inductive  definitions  (no  predicate  name  is  defined  more  than  once ) 
and  dom( S)  =  dom( S').  If  |=^s^  [S]  then  for  cdl  Q  we  have  ( s,h )  |=s  Q  implies 
(s,  h)  |=<s'>  Q. 

Proof  Suppose  (s,  h)  |=s  0  holds.  Applying  the  definition  of  |=s  gives  us  the  following. 

o,  h)  \=x  Q  for  all  X  G  Adom(s)  such  that  |=  Y  [S]  (5.1) 

We  must  show  (s,h)  |=^s  ^  Q.  We  have  |=3S^  [S],  which  by  Theorem  8  implies  the 
following. 

hx  for  all  X  e  Adom((S/>)  such  that  |=Y  [(S')]  (5.2) 

Note  that  dom( S)  =  dom({ S'))  and  thus  we  can  combine  (5.1)  and  (5.2),  obtaining  the 
following. 

(s,  h)  \=x  Q  for  all  X  e  Adom(S')  such  that  \=x  [(S')] 

Again  applying  Theorem  8,  we  have  (s,  h)  |=AS  ^  Q,  which  was  our  goal.  □ 

Besides  connecting  satisfaction  involving  inductive  specifications  to  satisfaction  in¬ 
volving  inductive  definitions,  the  theorem  above  also  provides  a  means  to  ensure  that 
the  use  of  multiple  specifications  does  not  introduce  inconsistency  into  the  system.  The 
premise  of  the  theorem  requires  that  a  subset  of  the  specifications  can  be  taken  as  a  set 
of  definitions  and  these  definitions  imply  the  validity  of  the  other  specifications.  If  this 
holds,  then  the  fact  that  each  set  of  inductive  definitions  has  a  least  fixed-point  (Theorem 
4)  guarantees  that  the  system  remains  consistent. 
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Thor  does  not  check  that  the  premise  of  the  theorem  above  holds  of  the  inductive 
specifications  provided.  Thus,  if  use  of  the  theorem  is  desired,  the  premise  must  be  verified 
by  the  user  via  other  means.  One  option  is  to  employ  a  system  such  as  that  given  in 
[Nguyen  and  Chin,  2008],  which  provides  support  for  formally  proving  separation  logic 
implications  involving  inductive  definitions  and  in  many  cases  allows  for  automation  of 
such  proofs. 


5.3  Basic  Types 

Figure  5.5  lists  the  types  used  by  the  algorithm  and  the  meta- variables  used  for  terms  of 
these  types.  The  type  “r  option”  is  the  type  of  optional  values  of  type  r.  That  is,  a  value 
of  type  “t  option”  may  either  be  Some(a)  for  some  a  of  type  r  or  it  may  be  None. 

Note  that  we  have  two  types  of  variable — one  that  is  used  for  program  variables  and 
another  that  is  used  for  instrumentation  variables.  In  the  following  presentation  we  will 
use  underlines  to  indicate  that  a  variable  is  of  type  IVar.  Non- underlined  variables  x,  y.  z 
and  their  subscripted  forms  denote  program  variables.  Either  type  of  variable  can  appear 
quantified.  The  type  Gen  of  instrumentation  generators  is  dependent  on  a  continuation  k 
of  type  K.  This  is  used  in  stating  the  specification  that  these  functions  must  satisfy.  This 
specification  (as  well  as  specifications  for  the  other  functions  used  by  the  implementation) 
is  given  in  Figures  5.6  and  5.7. 

In  the  implementation,  these  different  classes  of  variable  are  maintained  as  separate 
types.  However,  the  syntax  and  semantics  of  separation  logic  formulae  and  of  programs 
and  instrumented  programs  was  given  in  terms  of  a  single  set  of  variables,  Vars.  Thus, 
when  stating  theorems  about  the  implementation  presented  here,  we  need  some  way  of 
encoding  these  separate  types.  We  will  model  them  as  disjoint  subsets  of  the  set  Vars.  To 
support  this  set-based  interpretation,  we  will  sometimes  use  the  name  of  one  of  these  types 
to  represent  the  set  of  variables  of  that  type.  So  the  statement  x  e  IVar  should  be  read 
as  saying  that  a:  is  a  variable  in  the  subset  of  Vars  corresponding  to  the  type  IVar  in  the 
implementation. 


199 


5  Instrumentation  Analysis 


E 

E  list 

C 

Clist 

K 

K 


<f* 

G 

Gen (k  :  K) 


Var 

IVar 


The  type  of  expressions  e  as  defined  in  Figure  2.1. 

The  type  of  lists  of  expressions. 

The  type  of  commands  c  as  defined  in  Figure  2.1. 

The  type  of  lists  of  commands,  represented  by  the  meta- variable  c. 

The  type  of  continuations  k  as  defined  in  Figure  2.1. 

The  type  of  instrumented  continuations  k.  These  are  drawn  from  the  same 
language  as  values  of  type  K,  but  are  assigned  their  own  type  for  clarity. 

The  type  of  programs  P  as  defined  in  Figure  2.1. 

The  type  of  instrumented  programs  P.  These  are  drawn  from  the  same  language 
as  values  of  type  P,  but  are  assigned  their  own  type  for  clarity. 

The  type  of  symbolic  state  formulae  ip  as  defined  in  Figure  5.1. 

The  type  of  contexts  T.  Equal  to  Labels  (<f>  set). 

The  type  of  functions  fy,  which  arc  instrumentation  generators  for  continuation 
k.  These  arc  functions  of  type  <J>  — >  (G  x  K)  option  that  additionally  satisfy 
the  specification  given  in  Figure  5.6. 

The  type  of  program  variables,  x,  y,  z,  x\,  y\,  zi, . . . 

The  type  of  instrumentation  variables,  x,  y,  z,  x1,  y  , . . .. 

Figure  5.5:  Types  used  by  the  instrumentation  algorithm. 


Values  of  type  G  fill  the  same  role  as  the  contexts  T  from  Chapter  4.  In  that  chapter, 
we  defined  T  to  be  a  function  of  type  Labels  — >■  Q  (a  mapping  from  labels  to  separation 
logic  formulae).  In  the  implementation,  we  work  with  elements  of  $  instead  of  arbitrary 
separation  logic  formulae.  Since  elements  of  $  do  not  contain  disjunction,  but  disjunction 
is  generally  necessary  to  express  the  invariants  in  T,  we  let  values  of  type  G  be  functions 
of  type  Labels  — *  $  set  (mappings  from  labels  to  sets  of  formulae  drawn  from  <I>).  The 
sets  in  the  range  are  interpreted  disjunctively,  so  the  set  {(p1:  (p2,  ^3}  corresponds  to  the 
separation  logic  formula  p>\  V  (p2  V  p:i. 

The  implementation  also  uses  lists  of  commands  in  certain  places.  These  are  repre¬ 
sented  by  the  meta-variable  c  and  the  type  of  such  command  lists  is  “C  list.”  We  use 
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standard  syntax  for  lists,  writing  [ci, . . . ,  cn]  to  represent  a  list  of  commands,  []  to  repre¬ 
sent  the  empty  list,  and  c::  c  to  represent  the  cons  operator.  We  define  below  an  operation 
that  sequences  a  list  of  commands  with  a  continuation. 

(c ::  c)  9  k  =  c;  (c  ,  k ) 
e  9  k  =  k 


5.4  Basic  Structure 

Figures  5.6  and  5.7  provide  a  guide  to  the  functions  used  in  the  implementation.  For  each 
function,  we  list  the  type  of  the  function  and  the  formal  specification  that  it  must  satisfy. 
The  functions  all  return  optional  values.  The  option  type  is  used  throughout  because  each 
operation  in  the  analysis  is  partial.  The  problems  we  are  solving  are  undecidable  in  general 
and  so  sometimes  a  solution  will  not  be  found.  It  is  also  the  case  that  sometimes  a  solution 
just  does  not  exist.  Our  instrumentation  system  only  allows  us  to  derive  instrumentations 
for  programs  that  are  memory  safe.  So  if  a  program  is  not  memory  safe,  no  implementation 
of  the  system  described  in  this  thesis  would  be  able  to  produce  an  instrumented  version 
of  that  program.  This  restriction  to  memory-safe  programs  arises  as  a  consequence  of  the 
Command  rule  in  Figure  4.1,  which  requires  that  for  every  command  c  in  the  original 
program,  we  can  derive  the  partial  correctness  triple  { Q }  c  { O' } ,  where  Q  is  the  current 
precondition.  Since  partial  correctness  ensures  memory  safety  in  separation  logic,  such  a 
triple  is  only  derivable  if  c  is  memory  safe. 

If  the  instrumentation  process  gets  stuck  and  cannot  make  progress  in  the  analysis,  it 
will  return  a  result  of  None.  All  functions  called  by  the  main  procedure  for  the  analysis 
(which  is  called  instrument)  are  also  allowed  to  return  None  and  will  do  so  as  soon  as  a 
command  is  encountered  whose  safety  cannot  be  shown.  Once  this  occurs,  the  value  None 
propagates  up  the  call  stack  until  it  is  eventually  returned  by  the  instrument  procedure. 

Undecidability  of  the  problems  involved  can  also  manifest  as  non-termination.  For 
example,  the  implementation  includes  a  theorem  prover  for  showing  implications  between 
symbolic  state  formulae.  This  problem  is  undecidable  and,  as  a  result,  it  is  possible  for 
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Function  name  and  type 

fk  ■  Gen (k) 

instrument 

:  x  P  -»  (G  x  P)  option 

geninstCont 

:Gx$xK->(GxK)  option 

partialPost 

:$xC->$  option 

instPost 

:$xCx  Gen  (k)  — > 

(G  x  K)  option 


Specification 

If  fk(<p)  =  Some(r,  k)  then 
T  I-  {99}  k  ►rvar  k 

If  instrument(fpo)  P)  =  Some(r,  P)  then 
rhP  ►ivar  P  and  <po  £  T  (initloc(P)) 

If  geninstCont(r,  tp,  k)  =  Some(r',  k)  then 
T'  F  {<p}  k  ►iyar  k  and  VZ.  r'(Z)  2  T(l) 

If  partialPost(<p,  c)  =  50016(99')  then 

M  c 

If  instPost (<p,c,fk)  =  Some(r,fc)  then 
T  h  {99}  k  ►rvar  (c;  k) 


Figure  5.6:  A  summary  of  the  primary  functions  involved  in  the  implementation. 


an  implication  to  hold  but  for  the  theorem  prover  to  fail  to  show  this.  If  this  occurs  for  an 
implication  that  was  crucial  for  construction  of  the  instrumentation  proof,  the  analysis  will 
diverge. 


5.4.1  instrument 

At  the  highest  level  of  the  implementation,  we  have  a  function  instrument  of  type 
<f>  x  P  — *  (G  x  P)  option.  A  call  to  instrument (990,  P )  takes  the  following  arguments. 


202 


5.4  Basic  Structure 


Function  name  and  type 


branchAnnot 

:  $  x  (E  list)  — >  E  list 


Specification 


If  branchAnnot  (<p,  [ei, . . . ,  en])  = 
Vi.  {tp  A  e{  e'j) 


then 


implies 

:  $  x  $  x  I(  a  I(  option 


If  implies^, ,  <p',  k')  =  Some(A:)  then  for  all  T,  k 
T  F  {<p'}  k '  ►iVar  k 

implies 

r  b  {<£>}  k  ►rvar  k 


expo seCellThen Inst 
:  $  x  Var  x  Gen (k)  — > 

(G  x  K)  option 


If  exposeCellThenInst(<p,  x ,  fk)  =  Some(r,  k)  then 

T  b  {<£>}  k  ►rvar  k 


abstract 

:  $  ->  T>  x  (C  list) 


If  abstract (<p)  =  (ip\  c)  then  for  all  T,  k,  k' 
r  b  {</?'}  k'  ►  IVar  k 


implies 

r  b  {^}  (ctjk')  ►rvar  k 

Figure  5.7:  Additional  functions  used  by  the  implementation.  These  are  primarily  concerned  with 
reasoning  about  implications  between  symbolic  state  formulae. 
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P  The  program  to  be  analyzed. 

(f0  The  precondition  under  which  to  analyze  P. 

It  optionally  returns  a  context  T  and  an  instrumented  program  P  such  that  the  following 
holds. 

r  P  ►iVar  P 

If  the  algorithm  cannot  find  a  T,  P  such  that  this  relation  holds,  then  instrument  returns 
None. 

In  the  property  above,  we  make  use  of  IVar,  the  set  of  all  instrumentation  variables. 
In  practice,  any  program  uses  only  a  finite  subset  of  these.  According  to  Theorem  19,  we 
can  reduce  the  number  of  variables  used  in  the  statement  above  to  V'  =  fv(P )  —  fv(P), 
obtaining  the  following. 

ThP^-P 

Recall  that  the  role  of  T  in  the  instrumentation  rules  in  Figure  4. 1  was  to  give  invariants 
of  the  program  at  each  label.  The  instrumentation  analysis  has  to  automatically  infer  such  a 
T,  which  is  akin  to  inferring  loop  invariants.  It  also  has  to  determine  which  instrumentation 
commands  should  be  added. 

The  code  for  the  instrument  function  is  given  on  page  205.  It  consists  of  two 
loops,  where  the  first  loop  is  focused  on  generating  T  and  the  second  loop  performs  the 
instrumentation.  This  separation  of  concerns  aids  in  the  explanation  of  the  algorithm,  but 
does  cause  us  to  recompute  values  that  have  already  been  produced.  The  results  of  function 
calls  (most  crucially  geninstCont)  can  easily  be  cached  to  avoid  such  duplicate  effort. 

The  instrument  function,  as  well  as  subsequent  functions,  make  use  of  a  union 
operation  on  contexts,  defined  as  follows. 

(r1ur2)(z)  =  r1(z)ur2(z) 


The  instrument  function  processes  the  program  by  passing  each  continuation  to 
the  geninstCont  function.  geninstCont  has  type  G  x  <f>  x  K  — >  (G  x  K)  option.  It 
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Function  instrument  (<po,  P)  ■  Main  function  of  the  instrumentation  analysis. 

/*  Set  precondition  of  initial  location  to  (po  */ 

Tnew  :=  {(*0,  Wo})}  u  {(/,  0)  I  l  e  dom(P)  A  l  ^  l0} 

/*  Analyze  continuations  until  a  fixed-point  on  rnew  is 
reached.  */ 

repeat 

Toid  •  rnew 

foreach  l  e  dom(P)  do 
foreach  <p  e  rnew(/)  do 
match  geninstCont  (rnew,  ip,  P(l ))  with 
case  Some(r,  k) 

r  •=  r 

case  None 

return  None  /*  possible  memory  fault  */ 

end 

until  rnew  =  rold 

/*  Generate  instrumentations  of  all  continuations 

starting  from  the  invariants  stored  in  rnew  */ 

foreach  /  e  dom(P )  do 
let  {yd,  ■  ■  ■  ■  ^Pn\  rnew(/)  in 

let  Some(r1?  k\)  =  geninstCont  (rnew,  yg,  P(l))  in 

let  Some(r2,  k2)  =  geninstCont  (rnew,  <p2,  P( 0)  in 

let  Some(rri,  kn)  =  geninstCont(rnew,  tpn,  P(l))  in 
P(l)  :=  (branch  true  ki,  true  =>•  fc2,,, . . ,  true  =>■  kn  end) 

end 

return  (rnew,  P) 
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takes  a  context  F,  a  symbolic  state  formula  representing  a  precondition  <p0  and  a  continu¬ 
ation  k  and  optionally  returns  an  instrumented  continuation  k  together  with  a  new  context 
r  mapping  labels  to  symbolic  state  formulae.  The  context  T  describes  the  invariants  at  lo¬ 
cations  that  the  analysis  has  discovered  thus  far.  The  returned  context  T'  is  T  extended  with 
information  about  the  states  reachable  through  k.  Formally,  if  geninstCont  (T,  <p0,  k ) 
returns  Some(T/,  k)  then  these  should  satisfy 

T7  I-  {<A)}  k  ►iVar  k 

It  will  also  be  the  case  that  Ml.  T'(7)  D  T(Z).  That  is,  T'  is  an  extension  of  T  obtained  by 
adding  more  disjuncts.  If  None  is  returned,  it  indicates  that  no  such  T',  k  could  be  found. 
After  calling  geninstCont,  passing  in  Tnew  as  the  context,  the  instrument  function 
then  sets  Tnew  to  be  the  context  that  was  returned,  thus  ensuring  the  current  context  reflects 
the  information  about  reachable  states  discovered  by  geninstCont. 

At  a  high  level,  we  can  describe  the  instrumentation  analysis  as  a  fixed-point  compu¬ 
tation  on  T.  Suppose  we  are  analyzing  the  program  P.  First,  we  assume  that  fv(P )  C  Var 
(we  can  always  establish  this  by  renaming  variables).  This  ensures  that  the  new  variables 
we  will  be  adding  (which  are  in  IVar)  are  disjoint  from  the  program  variables.  Initially  we 
set  T  =  {(Z0,  {</?o})}  U  {(/,  0)  |  l  G  dom{P)  A  l  ^  /0}.  That  is,  T  maps  the  initial  location 
to  ip 0  and  all  other  locations  to  the  empty  set.  We  then  repeatedly  infer  the  post-conditions 
of  the  continuations  in  the  domain  of  P,  adding  these  post-conditions  to  T.  The  function 
T  maps  each  label  to  the  set  of  reachable  states  that  have  been  discovered  at  that  label.  If 
this  process  converges,  such  that  T  is  no  longer  growing,  this  indicates  that  we  have  fully 
characterized  all  the  reachable  states  of  the  program.  We  then  generate  the  instrumenta¬ 
tion  of  the  program  by  instrumenting  each  continuation  under  each  possible  precondition. 
The  version  of  instrument  given  here  discards  the  instrumentations  that  it  generates  in 
the  first  loop,  which  computes  T.  In  practice,  these  results  are  retained  to  avoid  duplicat¬ 
ing  work.  A  simple  memoization  scheme  is  sufficient  to  allow  reuse  of  these  previously 
computed  instrumentations. 

Proof  of  Correctness  We  now  show  that  if  geninstCont  satisfies  its  specification  as 
given  in  Figure  5.6,  then  instrument  also  satisfies  its  specification.  That  is,  we  show 
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the  following. 

if  instrument  (po,  P)  =  Some(r,P) 
then  T  b  P  ►ivar  P  and  p0  G  F (initloc(P)) 

Suppose  instrument  (p0,  P)  =  Some(r,P).  This  implies  that  the  first  loop  has 
terminated  and  each  geninstCont  call  in  the  second  loop  returns  Some(T;,  kj ) . 

That  the  first  loop  terminates  implies  that  Tnew  =  T0id.  This  implies  that  every  as¬ 
signment  Tnew  :=  T  in  the  body  of  the  loop  left  rnew  unchanged.  That  is,  for  each  p\ 
such  that  p\  G  Tnew(/)  we  have  that  geninstCont  (Tnew,  p\ ,  P(/))  =  Some(TJ,  kf)  im¬ 
plies  Tf  =  Tnew.  Given  the  specification  of  geninstCont  from  Figure  5.6,  these  F\  and 
k\  also  each  satisfy  F\  b  {p\]  k\  ►  iyar  P(7)  which,  applying  the  equalities  F\  =  Tnew, 
implies  Tnew  I-  WW  k\  ►  IVar  P(l)  for  each  p\  and  k\. 

Since  geninstCont  is  deterministic  (in  fact,  all  functions  involved  in  our  implemen¬ 
tation  are  deterministic),  the  calls  to  geninstCont  in  the  second  loop  will  also  satisfy 
these  properties.  In  particular,  Tnew  b  {p\}  k\  ►iyar  P(l)  for  all  >p\  e  Tnew(/)  implies 

Tnew  b  {\j  ip]}  branch  . . . ,  true  =>%},...  end  ►TVar  P(/)  (5.3) 

i 

by  repeated  application  of  the  Inst-Disj  rule  from  Figure  4.1. 

We  will  now  show  that  the  program  P  constructed  by  the  second  loop  satisfies 

T  b  P  ►iyar  P  and  ip0  e  T (initloc(P)) 

There  is  only  one  rule  for  showing  this,  namely  the  Inst-Prog  rule  in  Figure  4.2.  Since 
IVar  was  defined  to  be  disjoint  from  the  program  variables,  we  have  IVar  (T  fv(P)  =  0, 
which  is  the  first  premise  of  that  rule.  We  have  dom(P)  =  dom(P)  from  the  fact  that  the 
second  loop  defines  P(l)  for  each  l  e  dom(P).  The  initial  locations  are  the  same  in  each 
program,  so  we  have  initloc(P)  =  imtloc(P).  Finally  we  must  show  the  following. 

V/  G  dom(P).  (rncw  b  {rnew(Z)}  P(l)  ►iVar  P(l)) 

This  follows  from  (5.3)  and  the  fact  that  rnew  is  interpreted  disjunctively,  so  if 
rnew(7)  =  {pj , . . . ,  pi)  then  this  corresponds  to  the  formula  <Pi  v  ■  ■  ■  V  p\ \ 
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The  second  conjunct  of  the  specification  for  instrument  follows  from  the  second 
conjunct  of  the  specification  of  geninstCont.  We  have  <p0  E  Tnew(/0)  initially.  We  also 
have  that  all  calls  geninstCont  (Tnew,  <p,  k)  =  Some(r,,/c)  satisfy  V/.  r'(Z)  D  Tnew(/), 
which  implies  that  g0  E  From  this  it  follows  that  ip0  E  rnew(Z0)  for  the  final  value 

of  Tnew  computed  by  instrument. 

Organization  We  will  now  proceed  to  discuss  geninstCont  and  the  other  functions 
that  the  implementation  makes  use  of.  These  are  all  mutually  recursive  and  thus  difficult 
to  discuss  separately.  However  the  guide  in  Figures  5.6  and  5.7  should  be  of  use  in  under¬ 
standing  at  a  high  level  the  role  of  functions  that  have  yet  to  be  discussed.  We  will  also 
attempt  to  informally  give  the  intuition  behind  functions  that  are  being  used,  but  whose 
full  description  is  yet  to  come.  As  we  discuss  each  function,  we  prove  that  it  satisfies  its 
specification  as  given  in  Figures  5.6  and  5.7. 

5.4.2  geninstCont 

The  function  call  geninstCont  (T,  g>,  k )  takes  the  following  arguments. 

T  A  mapping  from  labels  to  sets  of  abstract  state  formulae  that  describes  the 
invariants  that  have  already  been  discovered. 

<p  A  symbolic  state  formula  that  gives  the  current  precondition. 
k  The  continuation  to  be  instrumented. 

geninstCont  has  an  optional  return  value.  If  it  returns  Some(r/,  k) ,  then  these  must 
satisfy  the  following. 

(r  f  M  k  ►iyar  k)  a  (vz.  r'(z)  d  r(z)) 

Recall  that  k  consists  of  the  commands  and  control  structure  of  k,  plus  possibly  some 
additional  commands  over  variables  in  IVar. 

The  code  for  geninstCont  is  given  on  page  210.  We  first  check  if  the  precondition 
is  unsatisfiable  by  calling  implies(g,  false, . . .),  which  returns  Som e(k)  only  if  false 
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can  be  established  from  the  precondition  p  (modulo  the  instrumentation  commands,  this 
corresponds  to  showing  < P  =*►  false).  Such  inconsistency  can  occur  due  to  the  accumulation 
of  constraints  from  branch  conditions,  implies  also  ensures  k  is  an  instrumentation 
command  that  establishes  the  precondition  false.  A  formal  summary  of  implies  is  given 
in  Figure  5.7.  Since  T  h  {false}  (assert(false) ;  halt)  ►iVar  k  holds  for  any  k  by  rule  False 
from  Figure  4.1,  our  specification  of  implies  ensures  that  the  following  holds. 

T  h  {p}  k  ►iVar  k 

This  result  satisfies  the  specification  for  geninstCont  from  Figure  5.6. 

If  p  is  consistent,  then  the  instrumentation  depends  on  the  form  of  the  continua¬ 
tion  k.  We  now  consider  each  case  in  turn,  describing  the  operations  performed  by 
geninstCont  and  presenting  the  soundness  argument  at  the  same  time  (that  is,  we  show 
in  each  case  that  geninstCont  satisfies  its  specification  as  given  in  Figure  5.6). 


CASE  k  =  (c;  k')\  In  the  case  of  a  command,  where  k  =  (c;  k'),  we  construct  the 
following  function,  which  we  will  refer  to  here  as  fk>  ■ 

fk'  =  Xx.  geninstCont  (T,  x,  k') 

Given  the  specification  of  geninstCont  from  Figure  5.6,  this  function  has  the  type 
Gen  (A;').  It  can  thus  be  passed  to  instPost,  which  expects  such  a  function  as  its  third 
argument. 

The  function  call  instPost(g>,  c,  /p)  computes  the  post-condition  of  c  with  respect 
to  the  state  p.  It  then  calls  /'/,./  with  that  post-condition.  The  reason  instPost  operates 
this  way,  instead  of  simply  returning  the  post-condition,  is  that  it  is  sometimes  necessary 
to  perform  case  splits  before  the  post-condition  of  c  can  be  determined.  In  such  situations, 
the  post-condition  can  be  different  under  each  branch  of  the  case  split.  Passing  /*,/  to 
instPost  yields  a  simple  method  of  obtaining  instrumentations  of  k  for  each  of  these 
cases. 

By  examining  the  specifications  given  in  Figure  5.6,  we  can  verify  that  the  code  in  the 
k  =  (c;  k')  case  is  correct.  To  satisfy  the  specification  for  geninstCont,  this  case  must 
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Function  geninstcont  (T,  p,k) .  Generates  an  instrumented  continuation  for  k 
starting  from  precondition  p . 

if  impl ies(</?,  false,  (assume(false);  halt))  =  Som e(k)  then 

/*  If  p  is  unsatisf iable,  return  k.  */ 

return  Some (r,  k) 
else 

/*  Otherwise,  continue  instrumenting  k.  */ 

match  k  with 

case  ( c;k ') 

return  instPost  (p,  c,  Xx.  geninstCont  (r,  x,  k')) 
case  branch  ei  =>  ki, . . .  ,en  kn  end 
let  [e'lr . . .  ,e'n]  =  branchAnnot(<^,  [ei, . . . ,  en])  in 

let  Some(ri,  k\)  =  geninstcont  (r,  p  A  e±,  k\)  in 

let  Some(r„,  knj  =  geninstcont  (r,  p  A  en,  kn)  in 

branch  ex  =>■  assume(e/1) ;  ki, . . .  \ 
en  =>  assume(e'J ;  kn  end J 

match  failed  return  None 
case  goto  l 

if  3 p'  G  T(l).  implies(</?,  p',  goto  /)  =  Some(k )  then 
return  Some(r,  k) 
else 

let  (p',c)  =  abstract(<p)  in 
return  Some(r[/  — >  (T(l)  U  p')\,  (c  ,  goto  /)) 
case  halt 

return  Some(r,  halt) 
case  abort 

return  Some (r,  abort) 

end 
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return  Some  (r,  k)  such  that 

r  h  {if}  k  ►iVar  (c;  k') 

(or  return  None).  Checking  the  specification  for  instPost,  we  see  that  the  return  value 
of  instPost(</?,  c,  fki )  satisfies  this  exactly. 


CASE  k  =  branch  . . . ,  e*  =A-  k^  . . .  end: 

For  each  case  i  of  the  branch,  we  conjoin  e%  to  the  current  symbolic  state  (p  and 
then  pass  this  updated  state  to  a  recursive  call  of  geninstCont.  By  the  specification 
of  geninstCont,  this  will  return  either  None  or  Some(rj,  (g)  such  that  the  following 
holds. 

1-7  \~  {(p  f\  ki  ^ IVar  ki 

We  also  call  branchAnnot  (<p,  [ei, . . . ,  en]).  This  returns  [e\ , . . . ,  e'n]  such  that  each  e- 
is  an  over-approximation  of  er  in  the  state  ip.  That  is,  ^  Ae,  A  e'  for  all  ry,  e'.  The  idea 
is  that,  whereas  the  e*  are  statements  over  program  variables,  which  may  involve  variables 
of  address  type,  the  e'  will  be  statements  over  instrumentation  variables. 

For  example,  under  the  symbolic  state  ls(n;x,  nil),  the  branch  condition  x  =  nil  might 
be  translated  to  n  =  0.  In  this  case,  the  call 

branchAnnot (/s(n;  x,  nil),  [x  =  nil,  x  ^  nil]) 

would  return 

[n  =  0,  n  >  0] 

The  specifications  of  the  recursive  geninstCont  calls  and  the  branchAnnot 
function  are  sufficient  to  allow  us  to  show  that  this  case  satisfies  the  specification  of 
geninstCont.  The  implications  (p  A  e*  e'  allow  us  to  apply  the  Inst-Assume  rule  to 
conclude 

r?:  h  {cp  A  Ci}  (assume(e') ;  /g)  ►iVar  kt 

Let  T'  =  Since  the  sets  given  by  r7(7)  are  interpreted  disjunctively — that  is, 

Ui(Fi)(Z)  corresponds  to  the  separation  logic  formula  \f  ,(^,(1)) — we  have  that  for  all  /, i 
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the  implication  Tj(7)  =>■  T'(/)  holds.  Thus  we  can  apply  Lemma  12  to  obtain 

T'  h  {</?  A  e;}  (assume(e') ;  kt)  ►iyar  h 
for  all  ei,  ki.  This  then  allows  us  to  apply  the  Branch  rule  to  obtain 

T'  h  {p}  branch  . . . ,  e  *  assume(e') ;  Ay, . . .  end  ►rvar  k 

Thus  the  value  returned  satisfies  the  specification  for  geninstCont. 


CASE  k  =  goto  /: 

In  the  goto  case,  there  are  two  approaches,  depending  on  what  can  be  shown  of  the 
current  state  p. 

“then”  branch  If  there  is  some  p'  in  T  associated  with  the  same  label  we  are  jumping 
to  such  that  p  p',  then  we  can  apply  the  Goto  rule  followed  by  the  Strengthening 
rule  as  follows. 

We  first  note  that  if  p'  e  T  then  we  have  the  following  by  the  Goto  rule  from  Figure 
4.1. 

T  h  {y }  goto  /  ►  iyar  goto  / 

Examining  the  specification  for  the  call  to  implies  (p,  p',  goto  /),  we  see  that  if  the  result 
is  Some  (A;)  then  this  ensures  that  the  following  holds. 

T\~{p}k  ►  iVar  gOtO  l 

Thus  returning  Som e(k)  allows  this  case  to  satisfy  the  specification  for  geninstCont. 

In  essence,  the  goal  of  implies^,  p',  k ')  is  to  generate  an  instrumentation  that  con¬ 
nects  p  to  p'.  This  instrumentation  may  involve  applications  of  Inst-Assign,  which  will 
prepend  commands  to  k  ’ .  It  may  also  make  use  of  Strengthening  and  case-splitting 
rules  such  as  our  Inst-Branch  derived  rule  from  Section  4.1.3. 

As  a  simple  example,  consider  the  call  implies (7s(n  —  1;  x,  nil),  As(n;  x ,  nil),  goto  /), 
where  T  maps  l  to  {/s(n;  x,  nil)}.  This  would  return  the  instrumented  continuation 
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(n  :=  n  —  1;  goto  /),  where  the  addition  of  the  command  n  :=  n  —  1  ensures  that 
if  ls(n  —  1;  x,  nil)  is  the  precondition,  then  ls(n ;  x,  nil)  will  hold  just  prior  to  the  goto  l 
statement. 


“else”  branch  If  we  instead  end  up  executing  the  “else”  branch  in  the  goto  l  case,  then 
we  call  abstract^).  The  goal  of  abstract  is  to  weaken  symbolic  state  formulae 
so  that  they  cover  more  states.  These  more  abstract  states  are  then  more  likely  to  be  loop 
invariants. 

For  example,  during  execution  of  a  program  that  creates  a  linked  list,  we  might  en¬ 
counter  a  symbolic  state  such  as  the  one  below. 

tpi  =  3z.  (x  (->•  [next  :  z])  *  (z  [next  :  nil]) 

This  formula  implies  the  formula  below,  which  would  be  a  valid  loop  invariant  for  a  list 
creation  routine. 

ls(n]  x,  nil) 

In  order  to  establish  this  formula,  we  need  to  initialize  n.  This  is  the  role  of  the  second 
component  of  the  return  value  of  abstract.  The  initialization  command  for  this  example 

is  n  =  2  and  so  abstract^)  would  return  (ls(n;  x,  nil),  [n  =  2]). 

The  formal  specification  of  abstract  given  in  Figure  5.7  ensures  that  if 
abstract(^)  returns  (<p',c)  then  for  all  T,k,k'  we  have  that  T  F  {<//}  k'  ►iVar  k 
implies  T  F  {if}  (c °<,k')  ►war  k.  Let  V  =  r[Z  ->■  (r(Z)  U{^})].  Clearly  V/.  T’(l)  D  T(l). 
We  have  that  T'  h  {ip'}  goto  /  ►IVar  goto  /.  The  specification  of  abstract  then  tells  us 
that  T'  h  {ip}  c  9  goto  /  ►iVar  goto  l  holds.  Since  we  return  Some(r',  (c  ,  goto  /)),  this 
establishes  the  specification  of  geninstCont  in  this  case  of  the  match. 


CASE  halt,  abort:  In  the  case  of  halt  or  abort,  no  instrumentation  commands  are  added. 
The  fact  that  the  return  values  in  these  cases  satisfy  the  specification  for  geninstCont 
follows  directly  from  the  rules  Halt  and  Abort  in  Figure  4.1. 
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Second  Conjunct 

We  now  show  that  the  second  conjunct  in  the  specification  of  geninstCont  holds.  We 
must  show  that  if  geninstCont  (r,  p,  k )  =  Some(r/,  k)  then 

v/.  r'(/)  5  r (i) 

In  the  branch  case,  we  have  V7.  r,(/)  D  T(/)  by  the  inductive  hypothesis.  We  then 
have  by  the  definition  of  U  on  contexts.  The  halt,  and  abort  cases  are  immediate, 

as  V/.  T(/)  D  T(Z)  trivially  holds.  This  leaves  the  (c;  k )  case  and  the  goto  l  case. 

For  (c;  k)  we  need  to  examine  the  definition  of  instPost.  This  is  defined  in  the 
next  section  and  we  will  discuss  it  in  more  detail  there.  For  now,  it  suffices  to  note  that 
the  context  instPost  returns  is  the  same  context  produced  by  the  function  passed  as  the 
third  argument — in  this  case,  a  recursive  call  to  geninstCont.  This  lets  us  apply  the 
inductive  hypothesis,  from  which  this  case  then  immediately  follows. 

For  goto  /,  the  “then”  branch  is  immediate  as  the  input  context  is  returned  unchanged. 
The  “else”  branch  returns  T[l  — »  (T (Z)  U  p')\.  Since  T(/)  U  tp'  D  T(/)  we  have  our  result. 

5.4.3  instPost 

The  function  instPost,  which  is  responsible  for  instrumenting  commands,  is  given  on 
page  215.  A  call  instPost  (p,  c,  fk )  takes  the  following  arguments. 

p  A  symbolic  state  formula  that  gives  the  precondition, 
c  The  command  whose  post-condition  should  be  taken. 

fk  The  instrumentation  generator  to  apply  to  the  post-condition  when  it  is  ob¬ 
tained. 

instPost  has  an  optional  return  value.  If  it  returns  Some(r,  k ),  then  these  must  satisfy 
the  following. 

r  I-  {p}  k  ►iVar  (c;  k) 
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We  write  A[x\  to  denote  the  commands  that  access  the  cell  at  x. 

A[x]  ::=  y  :=  x.f  j  free  x  \  x.f  =  e 

These  commands  require  a  heap  cell  to  exist  at  x  in  order  to  ensure  that  execution  does  not 
result  in  a  memory  fault. 

Function  instpost  (99,  c,  /fc) .  Takes  the  post-condition  of  (p  with  respect  to  the 
command  c  and  applies  fk  to  the  result,  returning  an  instrumentation  of  c;  k. 
fun  doPost(</?,  c,  fk)  = 
match  part  ialPost  (<p,  c )  with 
case  Some(</?') 
if  fk{v')  =  Some(r,  k)  then 
return  Some(T,  (c;  k)) 
else 

return  None 

case  None 
return  None 

end 

in 

match  c  with 

case  A[x] 

return  exposeCel  IThenInst  (</?,  x,  \ip.  doPost  (99,  c,  fk)) 

otherwise 

return  doPost(</9,  c,  fk) 

end 


The  function  instPost  makes  use  of  two  helper  functions:  partialPost  and 
exposeCellThenlnst.  The  partialPost  function  returns  the  post-condition  of 
a  command  with  respect  to  some  precondition,  but  is  not  able  to  perform  the  theorem 
proving  that  is  sometimes  necessary  to  show  that  the  heap  contains  a  cell  at  a  given  address. 
The  exposeCellThenlnst  fills  in  this  shortcoming  by  making  calls  into  a  theorem 
prover  for  symbolic  state  formulae. 
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Helper  Function:  partialPost 

The  code  for  partialPost  is  given  on  page  217.  This  function  implements  a  partial 
post-condition  operator.  It  takes  the  following  arguments. 

c p  A  symbolic  state  formula  that  gives  the  current  precondition, 
c  The  command  for  which  the  postcondition  should  be  computed. 

It  returns  either  None  or  Some  (</?') .  If  Some  (</?')  is  returned,  then  this  formula  satisfies  the 
following. 

M  c  W) 

For  assignment,  the  standard  strongest  post-condition  rule  is  used.  For  allocation,  we 
use  the  standard  post-condition  rule  from  separation  logic  Reynolds  [2002].  For  non- 
deterministic  assignment  we  existentially  quantify  what  is  now  the  previous  value  of  x. 
For  skip  we  leave  the  precondition  unchanged. 

The  rules  for  the  heap-manipulating  commands  first  check  that  the  precondition  syn¬ 
tactically  contains  a  points-to  predicate  specifying  the  contents  of  the  heap  cell  being  ac¬ 
cessed.  For  example,  in  the  case  for  x\  :=  x2.f,  the  expression 

let  (3z.  (E  *  (x2  [f  :  e,  p]))  AH)  =  tp  with  xi,  x2  z  in 

matches  ip  against  the  pattern  3z.  (E  *  (x2  [/  :  e,p]))  A  II.  The  match  succeeds  if  ip 

can  be  shown  to  have  the  given  form  using  only  the  equivalence  defined  in  Figure  5.2.  If 
the  match  succeeds,  then  z,  E,  e,  p,  and  II  are  bound  to  the  sub-formulae  at  these  positions 
in  <p.  Additionally,  the  condition  xi,x2  z  is  enforced,  which  may  require  alpha- varying 
ip  prior  to  performing  the  matching. 

Once  this  syntactic  match  has  been  performed,  the  precondition  is  updated  to  reflect 
the  effect  of  executing  the  command.  Heap-manipulating  commands  such  as  X\  :=  x2.f 
are  only  safe  in  states  containing  a  heap  cell  at  a  given  address  (in  this  case  a  heap  cell 
at  x2  with  field  /).  If  the  required  heap  cell  does  not  appear  in  the  formula  explicitly  as 
a  points-to  predicate  (that  is,  if  the  syntactic  match  fails),  then  the  function  returns  None. 
Otherwise  it  returns  Some  Up')  where  ip'  is  the  post-condition. 
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Function  partialPost  (<p,  c) .  Returns  the  post-condition  for  command  c  given 
precondition  p.  All  primed  variables  are  chosen  to  be  fresh.  Side  conditions  are 
satisfied  by  alpha-varying  p  (the  match  fails  if  this  is  not  possible). 

match  c  with 

case  x  :=  e 

return  Some(3a;/.  (p[x'/x\  A  x  =  e[x' /x})) 

case  a;  :=  alloc(/i, . . . ,  fn) 

return  Some  (3a;',  y[, y'n.  {p[x'/x\  *  (x  ^  [/i  :  y[, . . . ,  fn  ■  ?/„]))) 

case  x  :=  ? 
return  Some  (3a:.  p) 

case  skip 
return  Some(<p) 

case  Xi  :=  x2.f 

let  (3 z.  (E  *  (x2  i->  [/  :  e, p]))  A  II)  =  p  with  x\}x2  z  in 

let  e'  =  e[x\/x\]  in 

let  p'  =  p[x\/x\]  in 

let  S'  =  E  [x^/xi]  in 

let  IT  =  Ii[x\/x\]  in 

return  Some  (3x[,  z.  (S'*  [x^x^/xi]  (->•  [/  :  e',p']))  A  (II'  A  x\  =  e')) 

match  failed  =>-  return  None 

case  x.f  :=  e 

let  (3 z.  (E  *  (x  (->•  [/  :  d,  p]))  A  II)  =  p  with  fv(x,  e)  D  z  —  0  in 
return  Some  (33*.  (£*(*^[/:e,p]))AIl) 

match  failed  return  None 
case  free  x 

let  (3 z.  (E  *  (x  (->•  [p]))  A  II)  =  p  with  x  ^  z  in 

return  Some  (33*.  E  A  II) 

match  failed  return  None 

end 
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Helper  Function:  exposeCellThenlnst 

In  order  to  produce  a  result  for  a  command  that  accesses  a  heap  cell  at  x,  the  code  dis¬ 
cussed  above  for  partialPost  requires  the  precondition  to  contain  a  term  that  syntac¬ 
tically  matches  [p])  *  (p  for  some  p  and  p.  This  causes  the  code  to  return  None  in 

some  cases  where  a  post-condition  does  exist.  An  example  of  such  a  case  is  the  formula 
ls(n ;  x,  nil)  An  >  0,  which  implies  that  the  list  at  x  is  non-empty  and  thus  a:  is  a  valid 
pointer  into  the  heap.  However,  discovering  this  fact  requires  reasoning  about  separation 
logic  implications. 

We  will  talk  about  separation  logic  reasoning  in  Section  5.5.  In  the  meantime,  we 
will  give  a  high-level  description  of  exposeCellThenlnst,  which  is  the  function  that 
makes  the  appropriate  call  into  our  theorem  proving  system  to  show  that  a  heap  cell  at 
some  address  x  exists.  The  call  exposeCellThenlnst  (ip,  x,  fk)  takes  the  following 
arguments. 

c p  A  symbolic  state  formula  that  gives  the  current  precondition. 
x  The  address  of  the  heap  cell  to  be  revealed. 

fk  The  instrumentation  generator  to  apply  to  the  formula  that  results  from 
showing  that  x  is  in  the  heap. 

If  exposeCellThenlnst  (</?,  x,  fk)  returns  Some(T,  k)  then  these  must  satisfy 

T  h  {(f}k  ►ivar  k 

As  with  the  implies  function,  informally  described  on  page  212,  the  instrumentation 
commands  added  to  the  result  of  fk  in  order  to  obtain  k  may  consist  of  assignments  or 
branches.  To  take  a  branching  example,  consider  the  following  symbolic  state  formula. 

<p0  =  (Is (a;  x,  y)  *  ls(b,  y,  x))  A  a  +  b  >  0 

This  states  that  there  is  a  non-empty  cyclic  singly-linked  list  with  x  and  y  pointing  into 
it.  The  pointers  x  and  y  divide  the  cycle  into  two  segments:  one  starting  at  x  and  ending 
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at  y  and  the  other  running  from  y  back  to  x.  The  condition  a  +  b  >  0  implies  that  there 
is  at  least  one  heap  cell  in  the  cyclic  list.  This  implies  that  at  least  one  of  the  segments  is 
non-empty,  but  it  does  not  specify  which.  If  we  want  to  expose  the  heap  cell  at  x,  we  must 
first  case  split  on  whether  the  list  segment  starting  at  x  is  empty.  We  obtain  the  following 
if  the  segment  starting  at  x  is  non-empty  (and  thus  a  >  0) 

tpi  =  (3z.  x  i — y  [next  :  z]  *  ls(a  —  1;  z,  y)  *  ls(b,  y,  x))  A  a  >  0 

and  the  following  if  that  segment  is  empty  (and  thus  a  =  0) 

<p2  =  ( 3z .  x  (->•  [next  :  z]  *  ls(b  —  1]  z,  x))  A  x  =  y  A  a  =  0  A  b  >  0 

If  fk(<Pi)  =  Some(T1,  hi )  and  fk(<p 2)  =  Some(T2,  k2)  then  the  call 
expo seCell Then  Inst  (<po,  x,  fk) 

would  return 

Some(r!  U  r2,  branch  a  >  0  =>•  ki,  a  —  0  =>  k2  end) 

Correctness 

We  now  show  that  instPost  satisfies  its  specification.  We  first  consider  the  case  where 
c  does  not  match  A[x],  In  this  case,  instPost  calls  doPost (ip,c,fk)  which  calls 
part ialPost (cp,  c).  Suppose  part ialPost (</?,  c)  returns  Some(<^').  Then  by  its 
specification  in  Figure  5.6  we  have 

M  c  W)  (5.4) 

Since  fk  has  type  Gen (k)  we  have  that  if  f(<p')  returns  Some(T,  k)  then  the  following 
holds. 

T  h  {ip'}  k  ►iVar  k  (5.5) 

We  can  then  apply  the  Command  rule  from  Figure  4.1  to  (5.4)  and  (5.5)  to  obtain 

T  h  {<p}  (c;  k)  ►ivar  (c;  k ) 
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which  establishes  that  our  return  value  satisfies  the  specification  for  instPost. 

For  the  c  =  A[x]  case,  we  first  note  that  one  consequence  of  the  argument  above  about 
doPost  is  that  the  function 

X(p.  doPost {<p,c,fk) 

has  type  Gen(c;  k).  This  allows  it  to  be  passed  to  exposeCellThenlnst.  The  speci¬ 
fication  of  exposeCellThenlnst  then  tells  us  that  if  this  call  returns  Some(T,  k)  then 
we  have 

Th  {(f}k  ►ivar  (c;  k) 

which  satisfies  the  specification  for  exposeCellThenlnst. 


5.5  Theorem  Proving 

We  now  describe  our  proof  system  for  symbolic  state  formulae.2  This  forms  the  basis  of 
many  of  the  remaining  functions.  Specifically,  the  functions  exposeCellThenlnst, 
implies,  and  branchAnnot  all  make  use  of  the  theorem  proven  Each  of  these  func¬ 
tions  answers  slightly  different  problems,  and  so  we  will  actually  describe  three  different 
proof  systems.  However,  the  vast  majority  of  the  proof  rules  are  shared  by  all  three  sys¬ 
tems.  We  will  thus  start  with  the  simplest  problem,  entailment,  which  is  used  by  the 
implies  function,  and  then  describe  our  solution  for  the  more  complex  problems  of 
frame  inference  and  pure  abduction ,  by  focusing  on  the  differences  between  the  proof 
systems  for  these  problems  and  the  proof  system  for  entailment.  The  discussion  of  pure 
abduction  will  be  delayed  until  Section  5.10,  as  this  constitutes  an  optional  portion  of  the 
algorithm.  Instrumentations  for  programs  can  be  produced  without  having  a  proof  system 
for  pure  abduction,  but  including  this  system  enables  us  to  generate  more  precise  instru¬ 
mentations. 

2As  symbolic  state  formulae  correspond  to  separation  logic  formulae  of  a  restricted  form,  this  can  also 
be  viewed  as  a  proof  system  for  separation  logic  formulae  of  this  form. 
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5.5.1  Entailment 

Our  system  for  entailment  targets  the  same  problem  as  Berdine  et  al.  [2004]  and  Nguyen 
and  Chin  [2008],  although  our  system  is  unique  in  that  it  generates  instrumentation  com¬ 
mands  during  proof  search.  This  addition  is  necessary  if  the  prover  is  to  be  used  in  a 
system  for  producing  instrumented  programs,  such  as  the  one  we  are  considering  in  this 
chapter. 

We  start  with  an  example  showing  when  entailment  is  useful.  Suppose  we  have  reached 
symbolic  state 

( p  =  ls(n  +  1;  x ,  nil) 
and  have  previously  discovered  that  the  symbolic  state 

p'  =  ls(n ;  x,  nil) 

is  reachable  at  the  same  location.  In  this  case,  we  would  like  to  notice  that  we  can  reach 
p'  from  p  by  executing  the  instrumentation  command  n  :=  n  +  1.  If  we  can  show  this, 
then  we  may  stop  exploring  this  branch.  If  we  fail  to  notice  such  situations,  this  can  lead 
to  non-termination  of  the  algorithm.  This  is  the  sort  of  query  performed  by  the  implies 
function  and  supported  by  our  proof  system  for  entailment. 

Formally,  we  will  define  the  following  judgment. 

V'  II  k 

In  the  above,  p,  < p S,  and  k!  are  considered  inputs  and  k  is  the  output.  Recall  that  S  is  a 
set  of  inductive  predicate  specifications  as  described  in  Section  5.2. 

The  proof  system  will  be  designed  such  that  if  the  judgment  p  =>p  p'  H  k  holds  and 
T  h  {p'}  k'  ►ivar  k  for  some  T,  k,  then 

T  h  {p}  k  ►lVar  k 

To  establish  this,  the  entailment  system  can  be  viewed  as  transforming  a  proof  of 
T  h  {p'}  k!  ►ivar  k  into  a  proof  of  T  h  {p}  k  ►iyar  k  by  using  the  instrumentation 
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rules  in  Figure  4.1  to  fill  in  the  gaps  between  ip  and  ip'.  And  in  fact,  we  will  establish 
soundness  of  the  proof  system  by  showing  that  each  rule  presented  can  be  justified  in 
terms  of  rules  from  Figure  4.1. 

As  an  example,  if  ip  is 

/s(n!  +  1;  y,  x )  *  ls(n2:x,  nil)  A  x  ^  nil 


and  ip'  is 

E iz,  v.  ls(n\,  y,  x)  *  x  H »  [next  :  z,  data  :  n]  *  ls(n2',  z,  nil) 

then  the  system  may  reason  that  ip'  can  be  reached  from  ip  by  inserting  the  instrumentation 
command  n,  n1  +  1.  The  post-condition  of  this  command  is 

Is  (up,  y,  x)  *  ls{n2 ;  x,  nil)  A  x  ^  nil 

from  which  ip'  follows  by  pure  separation  logic  reasoning. 

Bookkeeping 

At  a  high  level,  proving  proceeds  by  matching  spatial  predicates  to  the  left  of  ==>■  with 

s 

spatial  predicates  on  the  right.  This  matching  procedure  is  essentially  an  application  of  the 
following  inference  rule  (th e  frame  rule),  which  is  admissible  in  separation  logic. 

Qi  =>  Q‘2 
Q\  *  R  =>  Q-2  *  R 

To  give  an  analogous  example  in  our  syntax,  if  the  following  holds 

V  =^k>  <P'  //  * 

then  the  statement  below  does  as  well  (provided  x  and  y  are  program  variables  and  not 
instrumentation  variables). 

ip  *  x  i — y  [data  :  y]  =^r,  ip'  *  x  ^  [data  :  y\  //  k 
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We  then  view  proof  search  as  proceeding  from  the  bottom  up.  If  we  are  ever  faced  with 

a  goal  matching  that  given  above,  we  can  note  that  x  H »  [data  :  y\  occurs  on  both  sides, 

discard  it,  and  proceed  to  search  for  a  proof  of  p  ==>£,  p'  /  k. 

s 

This  relatively  simple  matching  process  becomes  somewhat  complicated  in  the  pres¬ 
ence  of  instrumentation  commands,  pure  formulae,  and  quantifiers,  so  the  actual  proof 
search  is  performed  over  an  expanded  form  of  the  judgment,  which  includes  some  book¬ 
keeping  information. 

The  rules  for  the  proof  system  are  given  in  Figures  5.8  and  5.9  and  involve  judgments 
of  the  following  form. 

0  V  ^  //  * 

The  T,  ip,  ip' ,  S,  and  k!  components  are  the  same  as  before.  The  Ea  component  exists  to 
aid  in  the  matching  process.  As  spatial  predicates  in  ip  are  matched  with  predicates  in  ip' , 
the  matched  predicate  is  moved  to  Ea. 

Formally,  if  the  sequent 

D  v  v'  //  * 

is  derivable,  then  the  following  holds 

r  F  {Ea*  ip'}  k'  ►  iVar  k  implies  r  F  {Ea  *ip}k  ►Ivar  k 

The  following  components  are  inputs  in  a  bottom-up  proof  search  using  these  rules. 

S,k\T,a,(p,(p' 

The  only  output  is  k. 

Our  earlier  notation  ip  =>]:,,  p>  U  k  should  be  viewed  as  an  abbreviation  for  the  fol- 

s 

lowing. 

emp  p  =>p  p!  //  k 


Notation 

One  common  operation  in  the  rules  in  Figures  5.8  and  5.9  is  to  check  whether  a  spatial 
formula  is  present  in  a  symbolic  state  formula.  We  define  the  following  notation  to  indicate 
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this  check  (where  =  denotes  the  equality  relation  given  in  Figure  5.2). 

£'  e  p  =  (p  =  3x.  S  A  II)  and  S  =  S'  *  Si  for  some  Si  and  fu( S')  D  x  =  0 

This  implies  that  p  is  logically  equivalent  to  p'  *  S',  where  p'  =  3x.  Si  A  II  (using  the 
variable  names  in  the  definition  above). 

An  example  usage  of  this  notation  occurs  in  rule  NotNull  in  Figure  5.8,  where  we 
have  (e  1— »  p)  €  (S0  *  p)  as  one  of  the  premises.  Recall  that  Sa  *  p  denotes  the  symbolic 
state  formula  p'  that  is  semantically  equivalent  to  £a  *  p  (for  more  details,  see  Section 
5.1).  The  result  is  that  the  statement  (e  1— »  p)  e  (Sa  *  p)  is  true  when  e  (->•  p  is  present 
in  either  Sa  or  p,  with  quantified  variables  in  Sa  and  p  handled  appropriately  (though,  as 
can  be  seen  by  examining  the  other  rules,  £a  will  never  contain  quantifiers). 

As  another  example,  consider  the  statement  ((ei  t-A  pi)  *  (e2  '-A  p2))  G  (Ea  *  p),  as 
present  in  the  Disjoint  rule.  This  is  true  if  ei  (->•  p\  and  e2  *-A  p2  both  occur  in  Ea,  or 
both  occur  in  p,  or  if  one  occurs  in  £a  and  one  occurs  in  p.  Thus,  this  notation  gives  us  a 
concise  way  of  writing  statements  regarding  the  presence  of  spatial  formulae  which  would 
otherwise  involve  a  great  deal  of  disjunction. 

Rule  Explanation  and  Soundness 

We  now  go  through  each  rule  in  turn,  explaining  its  effect  and  presenting  its  soundness 
proof.  Soundness  is  shown  via  induction  on  the  structure  of  the  derivation.  Intuitively,  we 
want  a  derivation  of  £a  []  p  p'  //  k  to  ensure  that  we  can  reach  p'  from  p.  That  is,  via 
repeated  application  of  the  instrumentation  rules  from  Figure  4.1,  we  can  construct  some 
continuation  prefix  that  reaches  the  state  p'  along  all  of  its  branches.  Formally,  we  have 
the  statement  below. 

Theorem  28.  If  £a  []  p  ==>%,  p'  H  k  is  derivable  then  for  all  T,  k 

s 

r  (-  {£a  *  p'}  k!  ►rvar  k  implies  T  h  (Ea  *  p}  k  ►rvar  k 

Stated  in  terms  of  our  abbreviated  form  of  judgment,  this  becomes  the  following. 
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PropEqL 

£a  D  p[e/x\  A  x  =  e  ==»£,  <p'  /  /c 

s 

Ea  []  ip  A  x  =  e  ==>£,  <p'  H  k 

s 

NotNull 

(e  1 A/))e  (Sa  *  9?)  £a  []  <p  A  (e  /  nil)  =^,  <p'  /  k 

D  v  =^%'  v'  //  * 

Disjoint 

((ei  ha  pi)  *  (e2  ha  p2))  G  (Ea  *  <p)  £a  D  V  A  (ei  7^  e2)  =>£,  <p'  /  A? 

s 

D  b  =^k'  v'  //  * 

RightPure  LeftPureFalse 

II  =7-  3x.  IT  is  valid  II  =4>  false  is  valid 

Ea  []  emp  A  II  =^,  3  a;.  emp  A  II'  //  k'  Sa  []  S  A  II  =^,  <p'  /  assume(false) ;  halt 
s  s 

PtoMatches  PredMatches 

£a  *  (e  ha  p)  []  <p  ==» T,  p'  H  k  T,a*  d(e)  fl  p  b'  /  & 

_ s  ^ _  _ s  ^ _ 

Sa  D  (e  ^  p)  *  V  =>£/  <p'  *  (e  HA  p)  H  k  sa  D  d(e)  *  <p  ==>£,  <p'  *  d(e)  //  k 

s  K  s  K 

Figure  5.8:  Proof  system  for  entailment.  Basic  rules. 

Corollary  5.  Ifip  =>£,  p'  //  k  then  for  all  T,  k 

r  b  {<p'}  k'  ►ivar  k  implies  T  b  {<p}  k  ►iyar  k 

Proof.  The  proof  is  by  induction  on  the  structure  of  the  derivation  of  Sa  []  <p  ==Ap  p'  /  k. 
We  consider  each  case  below. 

PropEqL  This  rule  propagates  equalities  throughout  the  formula  on  the  left.  Applying 
our  inductive  hypothesis  yields 

T  b  (£a  *  p1}  k!  ►iyar  k  implies  V  b  (Ea  *  [p[e/x]  A  x  =  e)}  k  ►iyar  k  (5.6) 
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DefL 


(d(v)  <=>  ...  |  |  ...)  6S 

Ci{e)  =  (n,;  :  let  z)  satisfy  II-  in  pi) 

Vi.  (sa  y  (tp  *  w )  a  rij  a  n'  p'  /  &*) 

- - - - - Vi.  z*  ^  fv(p ,  Sa,  IT) 

sa  D  *  d(e)  =^k'  V  // 

branch  . . . ,  11*  =>■  z)  :=  ?;  assume(Il') ;  fcj, . . .  end 


InstL 


Sa  D  <4 


k' 


ip'  H  k 


Sa  n  <^[e/x] 


fc' 


<p'  H  (x  :=  e;  /c) 


x  0  fv(La),fu(x,e)  (IV  = 


ExistsR 


ExistsL 


D  V  =>p  V'[e/x\  //  k 


Sa  D  ¥>[c/®] 


k' 


<p'  H  k 


Sa  D  V 


k' 


3x.  ip'  //  k 


£a  []  Eta. 


fc' 


p'  //  c:=  ?;k 


c  fresh 


Figure  5.9:  Proof  system  for  entailment.  Rules  for  inductively  specified  predicates  and  variables. 
We  write  z  :=  ?  to  indicate  the  sequence  of  commands  zx  zn  :=  ?. 


We  must  show 

T  h  (£a  *  93'}  k'  ►ry§r  k  implies  T  h  (£a  *  (93  A  x  =  e)}  /c  ►  rvar  k 
We  first  assume  T  h  (£a  *  <£>'}  k'  ►iVar  k  and  apply  (5.6)  to  derive 

T  h  {Ea  *  (<p[e/x\  A  x  =  e)}  k  ►  rvar  k 

We  can  then  apply  the  Strengthening  rule  from  Figure  4.1  to  the  formula  above  using 
the  following  implication. 

(sa  *  (ip  A  x  =  e)  j  =>■  *  (ip[e/x\  A  x  =  e) ) 

This  yields 

r  h  (Sa  *  (p  Ax  =  e)}k  ►iVar  k 
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which  completes  the  proof. 

Note  that  the  antecedent  of  the  goal  matched  the  antecedent  of  the  implication  we 
got  from  the  inductive  hypothesis  (5.6).  This  will  be  the  case  for  all  rules,  so  we  will 
henceforth  focus  on  showing  that  the  conclusion  of  the  implication  from  the  inductive 
hypothesis  implies  the  conclusion  of  our  goal. 

NotNull  This  rule  adds  e  ^  nil  to  our  assumptions  in  cases  where  a  cell  at  location  e 
has  been  shown  to  be  present  in  the  heap.  For  soundness,  we  have 

T  b  {Ea  *  (if  A  e  nil)}  k  ►iVar  k 


and 

(e  I -A  p)  E  (Ea  *  if) 

which,  by  our  definition  of  this  notation  (see  page  223)  gives  us 

(So  *  f>)  =  (e  H-  p)  *  ipl 
for  some  <fi.  Note  that  this  implies 


Ea  *  f  (e  ^  nil) 


We  must  show 


T  b  (So  *if}  k  ►iyar  k 


This  follows  from  Strengthening  and  the  implication  above. 


Disjoint  This  rule  is  similar  to  the  one  above,  except  that  it  uses  the  fact  that  both 
e1  i  y  pi  and  e2  p-2  are  present  on  the  left  to  infer  e  \  ^  e2.  We  have 

T  b  {So  A  ei  7^  e2)}  k  ►iyar  k 


and 


((ei  t— >  p\)  *  (e2  i— >•  p2))  G  (So  *  ip) 
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This  second  fact  implies 


(£a  *(p)  =  ((ei  i-)-  pi)  *  (e2  ^  p2))  *  <£i 
for  some  <pi,  which  implies 


We  need  to  show 


(Sa  *  i?)  ^  ^  ^  e2 


T  h  {Ea  *lf>}k  ►rvar  k 

which  follows  from  Strengthening  and  the  implication  above. 


RightPure  This  is  one  of  the  axioms  of  the  proof  system.  It  is  triggered  when  the 

right-hand  side  becomes  empty — that  is,  the  component  to  the  right  of  the  =>-  no  longer 

s 

contains  any  spatial  predicates.  In  such  a  case,  we  check  that  the  left  also  contains  no 
spatial  predicates  and  that  the  pure  entailment  II  3x.  II'  holds.  Since  this  entailment 
does  not  involve  spatial  predicates,  it  can  be  sent  to  a  standard  theorem  prover  for  first- 
order  logic  plus  arithmetic.  We  then  set  the  output  to  k'  (viewing  the  proof  system  as 
specifying  a  bottom-up  search  algorithm).  This  output  gets  passed  down  the  proof  tree 
and  added  to  by  various  rules  such  as  DefL,  InstL,  and  ExistsL. 

For  the  soundness  proof,  we  have 

n  =>  3x.  IT 

and 

T  h  {£a  *  (3x.  emp  A  II7)}  k'  ►iyar  k 

We  must  show  that  the  following  holds. 

T  h  (Ea  *  (emp  A  II)}  k  ►iyar  k 

This  is  a  simple  application  of  Strengthening  with  the  following  implication. 

^£a  *  (emp  A  Ii)  j  ^£a  *  (3x.  emp  A  II')  j 

The  implication  above  follows  directly  from  our  assumption  that  II  3x.  11'. 
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LeftPureFalse  This  is  the  axiom  that  applies  when  the  left-hand  side  has  been  dis¬ 
covered  to  be  unsatisfiable.  As  with  RightPure,  the  pure  entailment  II  =>■  false  can  be 
checked  with  a  standard  theorem  prover  for  classical  logic  with  arithmetic. 

For  the  soundness  proof  in  this  case,  we  have  II  =>■  false  and  must  show 
T  h  {£a  *  (E  A  TI) }  (assume(false) ;  halt)  ►iyar  k 
This  is  an  application  of  False  from  Figure  4.1  to  obtain 

T  h  {false}  halt  ►ivar  k 
followed  by  Inst-Assume  to  obtain 

T  h  {false}  (assume(false) ;  halt)  ►IVar  k 
followed  by  Strengthening  with  Sa  *  (E  A  II)  =>•  false  to  obtain  our  goal. 

PtoMatches  In  this  case,  we  match  a  points-to  predicate  on  the  left  and  the  right.  For 
the  soundness  proof,  we  have 

rh{(£a*(e^p))  *<£}&►  IVar  k 

and  must  show 

r  h  {£a  *  ((e  1  y  p)  *  <p)}  k  ►  War  k 

which  follows  immediately  from  Strengthening  and  associativity  of  *. 

PredMatches  This  is  the  same  as  PtoMatches  except  that  we  are  matching  an 
inductive  predicate  instance  instead  of  a  points-to  predicate. 

DefL  In  this  case,  we  expand  an  inductive  predicate  on  the  left,  case  splitting  on  the 
possible  expansions.  We  insert  a  branch  into  the  instrumented  program,  with  one  case 
for  each  condition  IT,.  In  each  case,  we  first  non-deterministically  assign  the  J},  then 


229 


5  Instrumentation  Analysis 


assume  II',  which  establishes  the  connection  between  y  and  yt.  Finally  we  insert  k;,  the 
instrumented  continuation  for  case  i  of  the  inductive  predicate. 

As  an  example,  suppose  p  is  as  given  below 

Is  (up,  x,  y )  *  ls(n2)  y,  nil)  A  n1  +  n2  >  0 


and  p'  is 


3 z,  v.  x  i — y  [next  :  z,  data  :  v]  *  ls(n3 ;  z,  nil) 
If  we  then  search  bottom-up  for  a  proof  of 


0  <p  =^k>  v'  // k 

then  the  first  step  of  entailment  will  be  to  case  split  on  whether  the  first  list  segment  in  p 
is  empty.  This  results  in  the  following  two  sub-goals 

0  ls(n2\y,x)  Ax  =  y  Anl  =  D  Anl+n1>  =>fk  p’  //  rx  h  kx 

s 

and 

Sa  []  3 z.  x  h->  [next  :  z]  *  Is^rkp  z,  y) 

*  ls(n 2; 2/,r)  Anj  +  n2  >  0  >  0  A«j  =  n)  +  1  =f  u  /  //  r2 1-  fe 


Assuming  proofs  of  these  subgoals  are  found  (which  in  this  case  they  are),  then  they 
are  combined  such  that  the  k  returned  is 

branch  —  0  =>■  assume(true) ;  k\, 

>  0  n{  :=  ?;  assume^  =  nx  +  1) ;  k2  end 


For  the  proof  of  soundness,  we  have  the  following  for  each  i  from  our  inductive  hy¬ 
potheses. 

r  h  {((93  *  Pi)  A  n i  A  II')  *  £„}  %  ►iVar  k 

From  each  of  these  assumptions,  we  can  construct  the  following  proof.  We  write  Str  for 
Strengthening,  I-E  for  Inst-Exists,  and  I-A  for  Inst-Assume. 
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Ind.  Hyp.  — 

r*  i-  {((¥?  *  w)  a  n,  a  n')  *  sa}  ki  ►ivar  k 

_ 

r*  1“  {(((<£  *  <Pi)  A  n,;)  *  Ea)  A  n'}  ki  ►rvar  k 

I-A  - 

Tj  I-  {(((<£  *  (pi)  A  n,;)  *  E0)  A  n'} 

assume(n');  fcj  ►ivar  k 

I-E - = - 

Tj  h  {3zi.  (((<£>  *  ipi)  A  11*)  *  Effl)  A  n'} 

Zi  :=  ? ;  assume(n'); %  ►ivar  k 

STR  - - - = - %  0  H<P,  Hi) 

T;  I-  {(p>  *  (3 Zi.  ^iAnj  *  Ea)  A  nj 

Zi  :=  ?;assume(n');fcj  ►ivar  k 


Note  that  each  assumption  now  has  a  precondition  of  the  form  below 


((f  *  (3 Zi.  (fi  A  II')  *  E0)  A  n.t 


(5.7) 


Our  goal  is  to  show  that  the  following  holds,  where  kb  is  the  branch  in  the  conclusion 
of  the  rule. 

r  b  {(<£  *  d(e))  *  Ea}  kb  ►iVar  k 

By  expanding  d  according  to  the  same  specification  used  in  the  premise  of  the  rule  we  are 
considering,  we  can  see  that  the  precondition  in  this  formula  is  equivalent  to  the  following. 


p  * 


*  Sa 


Recall  that  [Cj(e)]  gives  the  interpretation  of  Ct(e)  as  a  separation  logic  formula.  Apply¬ 
ing  the  definition  of  [C'j(e)]  we  obtain  the  following 


ip  * 


*  £a 


By  commuting  and  re-associating  terms,  we  can  rewrite  this  such  that  it  is  equal  to  equation 
(5.7)  for  each  i.  The  soundness  of  the  branch  that  we  add  will  then  follow  from  an  n-ary 
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version  of  the  derived  rule  given  below. 

Q  =>  ( Qi  A  ei)  V  (Q2  A  e2)  T  b  {Qi  A  ei}  h  ►y  /c  T  b  {Q2  A  e2}  k2  ►y  k 
T  b  {Q}  branch  ei  ki,e2  =>  k2  end  ►  y  A; 

This  rule  is  simply  Inst-Branch  from  Section  4.1.3  but  with  the  premise 
Q  =>  (Q 1  A  ei)  V  (Q2  A  e2)  instead  of  Q  =>  e1  V  e2  and  preconditions  Qi  A  e*  instead  of 
Q  A  ey  The  reasoning  used  to  justify  it  is  the  same. 

InstL  This  rule  is  responsible  for  unifying  the  names  of  instrumentation  variables.  For 
example,  if  the  left-hand  side  of  the  sequent  contains  ls(n+ 1;  x.  nil)  and  the  right-hand  side 
contains  ls(n ;  a;,  nil)  then  we  cannot  apply  PredMatches  to  remove  these  nearly  match¬ 
ing  spatial  formulae  until  we  have  made  the  instrumentation  variables  match.  Since  we 
are  allowed  to  insert  new  commands  that  affect  the  instrumentation  variables,  we  can  add 
the  command  n  :=  n  +  1  in  order  to  connect  the  two  formulae.  The  post-condition  of  the 
left-hand  side  after  executing  this  command  is  then  ls(n;x,  nil)  and  the  PredMatches 
rule  can  be  applied. 

In  order  to  show  soundness,  we  assume  x  ^  /u(£a)  and 

T  b  {Sa  *(f}k  ►rvar  k 

By  the  Inst-Assign  rule  and  the  backward  Hoare  logic  rule  for  assignment,  we  have 

T  b  {(£a  *  ip)[e/x\}  (. x  :=  e;k)  ►rvar  k 
We  will  then  apply  Strengthening  to  show  that  our  goal,  given  below,  follows. 

T  b  {£a  *  ip[e/x\}  (x  :=  e;k)  ►iyar  k 
To  do  so,  we  must  prove  the  implication 

(l!a  *  (p[e/x] )  =>■  ((£a  *  (f)[e/x\^ 

We  assume  Yia*ip[e/x\.  Then  since  x  ^  /u  (£a)  we  can  extend  the  scope  of  the  substitution, 
obtaining  the  needed  result. 

(£a  *  <p)[e/x] 
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ExistsR  This  is  the  rule  used  to  instantiate  existentially  quantified  variables  on  the 

right  of  ==>p.  Reading  it  from  top  to  bottom,  if  p'[e/x]  follows  from  p,  then  3x.  p' 
s 

follows  from  p. 

For  soundness,  we  assume  that  for  some  T,  k  we  have  T  b  {Ea  *  (3x.  p’)}  k!  ►rvar  k. 
We  can  then  use  strengthening  and  the  implication  p'[e/x]  3x.  p'  to  obtain 

r  f  {s«  *  ^[e/x]}  k’  ►iVar  k 

From  our  inductive  hypothesis  we  have 

r  b  {Ea  *  p'[e/x]}  k!  ►  iVar  k  implies  T  b  {Ea  *p}k  ►rvar  k 
As  we  have  established  the  antecedent  of  this  implication,  we  can  conclude 

r  b  {Ea  *  p}k  ►rvar  k 


which  is  our  goal. 

ExistsL  This  rule  governs  the  elimination  of  existentially  quantified  variables  on  the 
left  and  is  justified  using  the  Inst-Exists  rule  from  Figure  4.1.  We  introduce  a  fresh 
variable  c  for  the  quantified  variable,  as  this  renaming  is  performed  by  our  implementation. 
It  is  not  strictly  necessary  for  soundness. 

We  must  show  the  following 

T  b  {Ea  *  (Ete.  p)}  c:=?;k  ►rvar  k 

and  we  have  the  following  as  an  assumption. 

r  b  {Ea  *  p[c/x]}  k  ►iVar  k 

We  first  apply  Inst-Exists  to  obtain  the  statement  below. 

r  b  {3c.  Ea  *  p[c/x]}  c:=?;k  ►iyar  k 
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That  c  is  fresh  implies  c  £  /u(Ea)  and  thus  we  have  that  3c.  Ea  *  (p[c/x\  implies 
Ea  *  (3c.  (p[c/x\).  Applying  Strengthening  with  this  implication  yields  the  following. 

r  b  {Ea  *  (3c.  v[c/x\)}  c  :=  ? ;  k  ►iyar  k 

We  then  note  that  since  c  is  fresh  and  thus  c  ^  fv(<p),  the  formula  3c.  p>\cjx\  is  an 
alpha- varying  of  3x.  (p.  We  thus  have  that  3x.  (p  implies  3c.  ip\c/x]  and  can  apply 
Strengthening  again  to  obtain  the  following,  which  is  our  goal. 

r  b  {Ea  *  (3x.  p>)}  c  :=  ?;  k  ►iyar  k 


Proof  Search  Structure 

There  are  many  potential  search  techniques  involving  the  rules  presented  in  Figures  5.8 
and  5.9.  Here  we  discuss  the  choices  we  made  in  our  implementation  of  this  proof  system. 

Our  proof  search  procedure  starts  by  eliminating  all  existentials  on  the  left  with  the 
ExistsL  rule.  Any  new  existentials  that  appear  on  the  left  during  the  search  (e.g.  by 
the  expansion  of  definitions)  are  also  eliminated  as  soon  as  they  arise.  The  procedure 
then  proceeds  by  inferring  pure  consequences  of  the  heap  assumptions  (rules  NotNull 
and  Disjoint),  propagating  equalities  (rule  PropEqL),  introducing  constants  for  existen¬ 
tials  on  the  left  (ExistsL),  expanding  definitions  (rule  DefL)  and  matching  spatial  pred¬ 
icates  (rules  PtoMatches  and  PredMatches).  As  spatial  predicates  are  matched,  they 
are  moved  to  the  portion  of  the  sequent  to  the  left  of  the  []  symbol.  Once  all  spatial  pred¬ 
icates  in  (p '  have  been  matched,  then  the  proof  search  can  terminate  with  the  RightPure 
rule,  closing  off  the  current  branch.  The  search  can  also  succeed  via  the  LeftPureFalse 
rule  if  the  antecedent  ever  becomes  inconsistent.  The  pure  entailment  checks  present  in 
the  premises  of  these  rules  (for  example,  II  3x.  II')  can  be  implemented  as  a  call  to 
an  automated  theorem  prover  for  classical  logic.  We  use  the  SMT  solver  Yices  [Dutertre 
and  Moura,  2006],  but  any  prover  with  support  for  existential  quantifiers  and  unbounded 
integer  variables  would  work. 

There  are  a  few  rules  that  would  seem  to  interfere  with  an  efficient  implementation  of 
the  proof  system.  The  ExistsR  and  InstL  rules  both  require  us  to  guess  a  substitution 
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to  apply  when  moving  from  the  inputs  in  the  conclusion  to  the  inputs  in  the  premise. 
However,  this  substitution  can  be  delayed  until  the  term  to  be  substituted  is  clear.  In  our 
implementation,  we  only  apply  these  rules  when  attempting  to  match  spatial  predicates  via 
the  PtoMatches  or  PredMatches  rules.  In  such  cases,  we  may  have,  for  example 

x  i — Y  [next  :  a,  data  \  b\  *  <p 


on  the  left  and 

3z,  q.  x  i — y  [next  :  z,  data  :  q]  *  (p' 

on  the  right.  In  this  case,  we  can  apply  the  ExistsR  rule  to  instantiate  z  with  a  and  q  with 
b,  which  results  in  the  two  point-to  predicates  matching  according  to  the  PtoMatches 
rule. 

Inductive  Specifications 

The  DefL  rule  first  looks  up  a  specification  for  the  inductive  predicate  d  in  the  set  of 
specifications  S.  If  there  are  multiple  specifications,  any  one  may  be  chosen.  The  side 
conditions  on  this  rule  can  always  be  satisfied  by  applying  alpha  conversion,  since  z%  is 
considered  bound  in  “let  z%  satisfy  II'  in  </?*.” 

This  expansion  of  inductive  predicates  is  a  potential  source  of  non-termination  for  our 
proof  search.  If  we  are  not  careful,  we  can  end  up  repeatedly  expanding  definitions  on  the 
left.  The  DefL  rule  is  also  the  only  source  of  branching  in  the  proof  system  and  the  number 
of  inductive  predicate  expansions  applied  has  a  large  effect  on  the  running  time  of  our 
proof  search.  To  combat  both  these  problems,  we  restrict  the  number  of  times  a  predicate 
can  be  expanded.  In  our  implementation,  we  associate  an  integer  with  each  inductive 
predicate  instance  and  increment  this  counter  each  time  the  instance  is  expanded.  This 
integer  starts  at  zero  and,  when  it  reaches  some  bound,  we  do  not  allow  further  expansion 
of  that  predicate  instance.  The  bound  can  be  set  via  a  command  line  argument.  We  have 
found  that  a  bound  of  one  (allowing  each  predicate  instance  to  be  expanded  once)  is  usually 
sufficient,  however  in  some  cases  two  expansions  are  required.  With  a  bound  of  two,  we 
have  not  yet  had  an  example  fail  verification  where  the  reason  for  failure  was  too  few 
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predicate  expansions  (any  failures  have  always  been  related  to  failure  of  the  abstraction 
heuristics  described  in  Section  5.7  or  failure  to  make  the  appropriate  inductive  predicate 
specification  available  to  the  system). 

Since  predicate  expansions  are  so  costly  in  terms  of  execution  time,  we  try  to  perform 

them  only  when  necessary.  Our  proof  search  will  only  apply  DefL  when  no  other  rules 

are  applicable.  When  we  do  apply  the  expansion  rules,  we  try  to  intelligently  choose  the 

appropriate  specification  from  S  to  use.  Suppose  we  are  applying  DefL  to  our  current 

goal  formula.  We  will  look  at  the  formula  on  the  right  of  the  =Af  arrow  and  see  what 

s 

spatial  predicates  have  not  yet  been  matched.  We  then  select  a  definition  that  can  expose  a 
predicate  matching  one  of  the  predicates  we  have  on  the  right. 

To  compute  what  predicates  a  definition  may  generate,  we  start  from  an  instance  of 
the  definition  with  distinct  variables  in  each  argument  position,  say  d(x).  We  then  recur¬ 
sively  expand  d.  As  we  perform  the  expansions,  we  replace  any  fresh  variables  that  would 
be  generated  with  a  wildcard  variable.  We  also  replace  non-address  variables  with  wild¬ 
cards  and  only  record  which  non-emp  spatial  predicates  are  generated.  Thus,  we  only 
track  what  happens  to  the  pointer-valued  arguments  of  d  during  expansion.  For  example, 
suppose  we  have  the  doubly-linked  list  specification  below. 

dll (h,  p,  first,  last,  n)  <=> 

k  —  0  :  let  []  satisfy  true  in  emp  A  first  =  n  A  last  =  p 
|  k  >  0  :  let  k'  satisfy  k  =  Jfi  +  1  in 

3z.  ( first  i-A  [prev  :  p,  next  :  z\)  *  dll ( A/ ;  first,  z,  last,  n )) 

Using  _  to  represent  a  wildcard  variable,  and  expanding  dll(_;  a,  b,  c,  d)  once  (and  discard¬ 
ing  non-spatial  predicates),  we  obtain  the  following. 

b  i — y  [prev  :  a,  next  :  _]  dll(_;  b,  c,  d) 

The  first  pattern  cannot  be  expanded  further,  but  the  second  pattern  can.  If  we  expand 
dll(_;  b,  c,  d)  we  obtain  the  following. 

_  t-A  [prev  :  b,  next  :  ]  dll(  ;  ,  ,  c,  d) 
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At  this  point,  expanding  any  of  these  patterns  results  only  in  patterns  that  have  already 
been  generated.  Thus,  we  have  generated  all  the  patterns  that  will  result  from  expanding 
dll(_;  a,  b,  c,  d). 

We  then  store  these  patterns  in  a  data  structure  that  supports  efficient  querying.  This  is 
essentially  a  multimap  from  patterns  to  specifications  that  is  aware  of  unification.  Sup¬ 
pose  we  look  up  E \z.  x  (->•  [prev  :  y,  next  :  z].  The  map  will  see  that  this  matches 
b  1-4  [prev  :  a,  next  :  _].  It  will  bind  b  to  x  and  a  to  y  and  return  as  one  of  its  results 
the  pattern  dll(_;  y,  x,  _)  along  with  the  specification  that  was  used  to  obtain  it.  This  indi¬ 
cates  that  expanding  a  predicate  instance  matching  dll(  ;  y.  x,  ,  )  will  produce  a  points-to 
predicate  that  matches  3z.  x  [prev  :  y.  next  :  z].  We  then  search  the  left  formula  of  our 
current  goal  for  such  a  spatial  formula  matching  dll(_;  y,  x,  ,  _),  expand  it,  and  proceed. 

We  can  generate  this  pattern  map  on  program  start-up  as  soon  as  we  read  in  the  list  of 
inductive  predicate  specifications  provided  by  the  user,  after  which  it  benefits  every  proof 
search  performed  by  the  analysis  (and  there  are  typically  hundreds  of  frame  inference 
queries  even  for  small  examples).  Applying  this  optimization  significantly  speeds  up  our 
proof  search.  Furthermore,  proof  search  is  by  far  the  major  contributor  to  running  time, 
thus  any  proof  search  optimizations  have  a  large  effect  on  total  running  time  of  the  analysis. 

Note  that  we  do  not  have  a  corresponding  “DefR”  rule  for  expanding  definitions  on 
the  right.  Such  a  rule  could  be  added,  but  has  proved  unnecessary  in  our  experiments.  We 
comment  further  on  this  in  Section  5.7,  which  discusses  abstraction,  as  this  is  the  operation 
that  renders  DefR  unnecessary. 


5.5.2  implies 

We  now  show  how  the  proof  system  just  presented  is  used  to  implement  the  implies 
function.  On  page  on  the  following  page  we  give  the  implementation  of  implies.  The 
function  call  implies(p,  tp',  k! )  takes  the  following  arguments. 
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p  An  antecedent  formula, 
c p '  The  consequent  formula. 

k!  An  instrumentation  of  some  continuation  under  precondition  ip' . 


Given  an  instrumentation  k '  of  some  continuation  k  starting  from  the  precondition  <p',  a 
call  to  implies(<p,  tp',  k ')  returns  Some  (A;)  if  it  can  establish  that  k  is  an  instrumentation 
of  k  with  precondition  ip.  That  is,  if  implies(<p,  tp',  k ')  =  Som e(k)  then  for  all  k 

T  b  {p'}  k'  ►iVar  k 


implies 


Th{p}k  ►rvar  k 


Function  implies  (p,  p' ,k')  .  Assumes  that  T  b  {p'}  k'  ►r/ar  k  for  some  T  and 
k.  If  so,  and  implies  returns  Some(£;)  then  T  b  { p }  k  ►rvar  k  holds  for  the  same 

T  and  k. _ 

let  {tpa,  ca)  =  abstract(<p)  in 
if  pa  =>p  p'  //k  then 
return  Some(T,  (ca;  k)) 
else  return  None 


The  function  first  calls  abstract(p)  in  order  to  simplify  the  state  formula.  In  par¬ 
ticular,  abstract  will  fold  inductive  predicate  definitions,  which  is  something  that  our 
entailment  system  does  not  do — entailment  will  only  expand  predicates  on  the  left.  For 
example,  abstract (3k.  x  (->•  [next  :  k]  *  k  i-»  [next  :  nil])  will  return  ls(rv,x,  nil)  and 
the  instrumentation  command  n  :=  2.  Entailment  is  not  able  to  create  instances  of  data 
structures,  nor  for  example  to  take 

3z.  x  i — y  [next  :  z]  *  /s(n;  z,  nil) 

and  discover  this  implies  ls(n  +  1;  z,  nil). 
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This  is  a  deliberate  choice,  as  restricting  entailment  only  to  expansionary  rules  sig¬ 
nificantly  decreases  the  search  space  and  helps  prevent  cycles  in  the  proof  search.  By 
combining  the  expansionary  behavior  of  entailment  with  the  collapsing  or  summarizing 
behavior  of  abstraction,  we  are  able  to  perform  all  the  inference  steps  necessary  for  our 
instrumentation  procedure  while  increasing  efficiency  of  the  component  operations. 

Following  the  call  to  abstract,  the  implies  function  then  calls  into  entailment, 
passing  in  the  continuation  k.  It  then  returns  the  instrumentation  k  that  is  discovered  by 
entailment. 

That  implies  satisfies  its  specification  from  Figure  5.7  follows  directly  from  Corol¬ 
lary  5  and  the  specification  of  abstract. 

5.5.3  Frame  Inference 

We  now  consider  a  slight  modification  of  the  proof  system  presented  in  Section  5.5.1. 
Whereas  the  original  proof  system  was  able  to  answer  queries  of  the  form  p  =$■  p',  the 
new  system  permits  the  case  where  p'  specifies  a  sub-heap  of  p  (implication,  in  contrast, 
requires  both  formulae  to  describe  heaps  with  the  same  domain).  The  problem  is  very 
similar  to  the  frame  inference  problem  described  in  Berdine  et  al.  [2005],  but  differs  in 
that  we  will  need  to  produce  instrumentation  commands  during  the  proof  search.  The 
frame  refers  to  that  portion  of  the  heap  described  by  the  hypothesis  which  is  not  in  the 
conclusion.  Inferring  frames  is  useful  when  a  particular  command  requires  a  piece  of  heap 
to  exist  but  does  not  care  whether  the  heap  contains  additional  elements. 

As  an  example  of  such  a  situation,  consider  the  symbolic  state 

p  =  ls(n ;  x,  nil)  A  x  f  nil 

Suppose  we  are  trying  to  take  the  post-condition  of  this  state  with  respect  to  the  command 
x  :=  a;. next.  Doing  so  requires  us  to  show  that  a  heap  cell  at  x  exists.  In  this  case,  such  a 
cell  does  exist  since  p  implies  the  following  formula. 

p  =  E iz,  v.  x  i — y  [next  :  z,  data  :  v]  *  ls(n  —  1;  z,  nil) 
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However,  we  don’t  generally  know  this  expanded  version  of  the  state  formula.  We  would 
like  to  be  able  to  ask  our  proof  system  to  show  that  x  is  in  the  heap  and  obtain  p'  while 
providing  only  p  and  x.  This  is  the  sort  of  query  facilitated  by  our  system  for  frame 
inference. 

Frame  inference  is  also  useful  for  answering  pure  entailments.  Suppose  we  have  the 
symbolic  state 

p  =  ls(n;  x,  nil)  A  n  =  0 

and  we  want  to  know  whether  this  implies  x  =  nil.  In  this  case,  we  can  ask  whether  the 
implication  below  holds. 

p  =>■  x  =  nil 

But  note  that  this  is  different  from  the  implications  considered  in  Section  5.5.1.  In 
the  previously-presented  proof  system  for  entailment,  there  was  a  spatial  aspect  to  the 
proving — we  wanted  all  of  the  heap  described  by  the  antecedent  to  be  accounted  for  by 
the  consequent.  In  this  example,  since  the  consequent  is  pure,  we  do  not  have  this  re¬ 
quirement.  The  antecedent  is  allowed  to  describe  any  amount  of  heap.  Such  a  situation  is 
captured  by  asking  whether  there  is  a  frame  that  allows  us  to  show  x  =  nil  follows  from  p 
(the  particular  frame  does  not  matter,  we  only  check  that  a  valid  frame  exists). 

Pure  entailment  could  also  be  handled  by  our  system  for  entailment  from  Section  5.5.1 
if  we  allowed  true  to  appear  as  a  spatial  formula.  The  example  query  above  would  then 
correspond  to  the  implication  p  (x  =  nil)  *  true.  However,  since  we  do  not  have  “*true” 
in  our  language  of  symbolic  state  formulae,  pure  entailment  is  more  naturally  built  on  top 
of  frame  inference. 


Formulae  with  holes  In  order  to  account  for  queries  such  as  “does  the  heap  contain  a 
cell  at  address  x?”  which  arise  frequently  when  checking  memory  safety,  we  allow  the 
consequent  of  a  frame  inference  query  to  contain  the  special  points-to  predicate  14  □. 
The  □  will  match  any  record  expression  and  is  only  allowed  to  occur  once  in  any  symbolic 
state  formula.  Thus,  the  predicate  states  that  the  heap  contains  a  cell  at  address  x, 

but  provides  no  information  about  the  contents  of  the  heap  cell.  This  predicate  is  satisfied 
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by  any  heap  consisting  of  a  single  cell  at  x.  In  particular,  the  set  of  fields  present  at  x  do 
not  matter,  so  the  following  are  both  valid  implications. 

x  i->-  [next  :  nil]  =>■  x  i->  □ 
x  i-)-  [next  :  y,  data  :  0]  =>■  x  h->-  □ 

Formally,  we  can  give  a  semantics  for  x  (->•  □  by  extending  the  satisfaction  relation  in 
Figure  2.7  with  the  following  case. 

(s,  h)  \=x  ea  h-)-  □  h  —  {(([ea]  s),  r)}  for  some  r  e  Records 

The  predicate  □  essentially  acts  as  a  pattern,  ensuring  that  frame  inference  ex¬ 
poses  a  points-to  at  the  appropriate  address.  This  operates  somewhat  like  the  common  sep¬ 
aration  logic  abbreviation  — ,  which  is  frequently  used  as  shorthand  for  3 y.  x  H »  y. 

If  we  had  variables  of  record  type  and  permitted  existential  quantification  over  these,  such 
that  y  in  3 y.  x  (->•  y  could  represent  some  set  of  field  bindings,  then  we  could  use  a  similar 
abbreviation.  Since  we  make  limited  use  of  these  patterns  (in  particular,  since  we  only 
require  at  most  one  in  any  formula),  we  found  it  simpler  to  work  with  the  weaker  x  (->•  □ 
form  and  avoid  the  complexities  of  introducing  more  types  of  variable. 

Judgment  Form  and  Soundness 

As  just  mentioned,  our  primary  use  of  frame  inference  is  to  expose  heap  cells  needed  to 
compute  post-conditions  for  heap-manipulating  commands.  The  structure  of  the  judgment 
we  define  must  change  slightly  to  accommodate  this  usage.  The  interface  we  will  adopt  is 
the  following. 

Input:  p  A  symbolic  state  formula  describing  the  current  state. 

c p'  A  symbolic  state  formula  describing  the  heap  that  is  required  to  be  present. 

fk  A  function  that  takes  a  formula  p"  and  produces  an  optional  pair  (T'.  k'), 
where  T'  is  a  context  and  k!  is  an  instrumented  continuation. 

S  A  set  of  inductive  predicate  specifications  describing  the  data  structures 
used. 
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We  also  require  that  the  input  satisfy  the  following  invariant: 

If  fk  =  Some(r',  k')  then  T'  b  {</?' }  k'  ►p/ar  k 

Note  that  fk  is  parameterized  by  the  continuation  k  that  it  produces  an  instrumentation 
of.  This  parameter  is  included  to  help  make  it  clear  which  k  is  being  considered  during 
examples  and  proofs. 

Output:  k  An  instrumentation  of  k. 

T  A  context. 

These  outputs  must  satisfy  T  b  {p}  k  ►ivar  k. 

The  form  of  our  judgment  for  frame  inference  will  be  the  following. 

=f7*  7//  r  h  k 

where  <p,tp',  S,  and  fk  are  considered  inputs  and  T  and  k  are  the  outputs. 

Relation  to  Entailment  The  function  fk  in  frame  inference  corresponds  to  the  input  k! 
from  entailment.  One  might  wonder  why  frame  inference  requires  this  input  to  be  a  func¬ 
tion  while  a  single- valued  input  sufficed  for  entailment.  The  reason  is  that,  when  searching 
for  a  frame  that  shows  p  contains  p',  we  may  find  different  frames  along  different  branches 
of  the  proof. 

For  example,  let  ip  be  the  following  formula 

{Isin^x.y)  *  ls(n2-y,x ))  A  (nx  +  n2  >  0) 
and  suppose  we  want  to  show  the  following. 

=|*7fc  ^□//Tbfc 

We  know  from  +  n2  >  0  that  at  least  one  of  the  two  lists  is  non-empty  and  thus  x  is  in 
the  heap.  However,  the  portion  of  the  heap  that  remains  when  we  separate  out  x  is  different 
depending  on  whether  >  0.  If  >  0  then  we  have  that  p  implies  the  following. 

Eh,  v.  x  (->•  [next  :  z,  data  :  n]  *  —  1;  z,  y )  *  ls(n2 >  Vi x )  (5-8) 


242 


5.5  Theorem  Proving 


If  nx  —  0  then  we  have  that  <p  implies  the  formula  below. 

3z,  v.  (y  (->•  [next  :  z,  data  :  v]  *  ls(n2  —  1>  zi x ))  A  x  =  y  (5.9) 

We  use  the  function  fk  to  account  for  this.  In  the  above  example,  // .  would  be  expected 
to  produce  an  instrumentation  for  each  of  these  possible  preconditions.  Let  (px  be  formula 
(5.8)  and  (p2  be  formula  (5.9).  If  fk(p i)  =  Some^x,  ki)  and  fk((p2)  =  Some(r2,£;2)  then 
a  valid  instrumentation  from  the  precondition  p  is 

branch  >  0  ki, 

=  0  =>■  k-2  end 

Let  this  continuation  be  k.  We  then  have  the  following. 

U  r2  h  {99}  k  ►iVar  k 

This  fact — that  the  output  of  frame  inference  results  in  a  valid  instrumentation  of  k — is 
the  main  soundness  theorem  for  frame  inference  and  is  discussed  further  below. 

As  with  entailment,  we  track  some  extra  bookkeeping  information  during  the  search 
for  a  proof  in  the  form  of  a  list  of  matched  spatial  formulae  E a.  This  plays  the  same  role 
it  did  in  entailment  and  is  described  on  page  223.  The  statement  p  => fk  p'  /  T  h  k  is  an 
abbreviation  for  the  following  judgment,  which  tracks  this  extra  information. 

0  V  v'  II  r  I-  k 

Soundness  As  with  entailment,  the  soundness  result  we  will  seek  states  that  the  output 
of  frame  inference  is  a  valid  instrumentation. 

Theorem  29.  If  Ea  []  (p  fk  <p'  H  T  h  k  is  derivable  then  so  is 

T  L  (Ea  *  ip}k  ►  iVar  k 

Stated  in  terms  of  our  abbreviated  form  of  judgment,  this  becomes  the  following. 
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Corollary  6.  If  ip  =>fk  <p'  /  T  h  k  is  derivable  then  so  is 
s 

r  h  {ip}  k  ►iVar  k 

Since  a  major  use  of  frame  inference  in  our  system  is  to  rewrite  symbolic  state  formulae 

into  a  given  form,  it  is  also  worth  showing  that  the  function  fk  is  called  with  arguments 

of  the  appropriate  form.  This  is  captured  by  the  following  theorem,  which  states  that  the 

instrumentation  function  fk  is  only  called  with  symbolic  states  ip  which  have  been  shown 

to  describe  a  heap  containing  some  sub-heap  satisfying  ip',  the  symbolic  state  formula  to 

the  right  of  the  =>- . 

s 

Theorem  30.  In  a  derivation  of 

£«  0 /» ?'// r  i- £ 

The  function  fk  is  only  called  with  inputs  of  the  form  ( ip "  *  £0)  for  some  ip"  such  that 
ip"  =></?'*  true. 

Stated  in  terms  of  our  abbreviated  form  of  judgment,  this  becomes  the  following. 

Corollary  7.  In  a  derivation  of  ip  ==>- /k  ip'  //  T  \-  k,  the  function  fk  is  only  called  with 
inputs  ip"  such  that  ip"  =>  tp'  *  true. 

Rules  and  Proof  of  Soundness 

We  now  present  the  rules  for  frame  inference  along  with  a  proof  of  Theorems  29  and  30 
(which  are  shown  by  structural  induction  on  the  frame  inference  derivation).  Most  of  the 
rules  are  the  same  as  for  entailment,  with  the  only  difference  being  the  replacement  of 
input  k!  with  the  input  function  fk  and  the  inclusion  of  the  output  context  T.  For  example, 
the  rule  PropEqL  becomes  the  following. 

PropEqL 

£a  []  <p[e/x]  Ax  =  e  =^fk  ip'  //T\~k 
£a  Q  p  A  x  =  e  =^fk  p'  //T\-k 
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PropEqL 

£a  D  <p[e/x\  A  x  =  e  ==>fk  ip'  //TP  k 
Sa  1  ip  A  x  =  e  =>/fc 

NotNull 

(e  *-*■  p)  G  (So  *  <p)  Za  n  P>  a  (e  +  nil)  =»/fe  p'  //  T  E 
Za\\p^fkip'  //TPk 

Disjoint 

((ei  i-A  pi)  *  (e2  p2))  £  (£a  *  <p)  Sa  []  99  A  (ei  /  e2)  =|A/fc  y?'  //  T  E  fc 

LeftPureFalse 

II  =>  false  is  valid 

Sa  |  S  A  II  ==^fk  ^  //  f  f  assume(false) ;  halt 


PtoMatches 

Sa  *  (e  ^  p)  D  P  =^fk  P>'  //  r  E  k 
D  (e  ^  p)  *  <P  =^fk  <p'*(e^p)//Thk 

InstL 


Za\\<P=ffkv'  //TPk 
Sa  D  ¥>[e/x]  =j*fk  if/  //  r  E  (x  := 


PredMatches 

Sa  *  d(e)  D  <p  =Y>fk  ip'  //TPk 


Xaj\d(e)*<p=>fkip'*d(e)//TPk 


— —  X  0  fv(Ea),fv(x,e) 
e;k) 


ExistsR 

D  V  =^fk  T'[e/x]  //TPk 
D  <P  =^fk  P1  //TP  k 


ExistsL 

Ea  D  <p[c/x]  =*fk  ip’  //TP  k 
Ea  D  3®.  ip  =^fk  if/  //TP  c:=  ?;  A? 


c  fresh 


Figure  5. 10:  Rules  for  frame  inference  that  arc  the  same  as  for  entailment. 
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The  full  list  of  rules  that  are  essentially  unchanged  is  given  in  Figure  5.10. 

The  first  rule  that  is  different  is  RightPure.  In  the  system  for  frame  inference,  rather 
than  returning  the  k!  that  was  passed  in  as  the  output  instrumentation,  we  instead  call  //,  to 
obtain  the  output  instrumentation.  We  also  no  longer  require  that  the  spatial  portion  of  the 
left-hand  formula  be  empty.  The  new  rule  is  given  in  Figure  5.11. 

We  also  must  change  the  DefL  rule  to  account  for  the  fact  that  each  branch  of  the  proof 
may  return  a  different  context  (the  other  rules  do  not  branch  and  thus  just  pass  the  context 
from  the  premise  through  to  the  conclusion).  The  new  rule  merges  the  contexts  from  the 
premises  using  the  union  operation  defined  for  contexts  on  page  204.  The  updated  version 
is  given  in  Figure  5.11. 

Finally,  we  must  add  a  rule  to  handle  our  new  x  (->•  □  construct.  This  is  given  as  rule 
PtoMatchesAny  in  Figure  5.11  and  captures  the  fact  that  x  H »  □  on  the  right  matches 
any  points-to  predicate  of  the  form  x  (->•  p  on  the  left. 

Proof  of  Soundness  The  proof  of  Theorem  29  for  the  rules  in  Figure  5 . 1 0  is  the  same  as 
for  Theorem  28,  which  was  described  on  page  243.  The  only  difference  is  the  presence  of 
T  and  the  fact  that  //,.  is  a  function. 

We  take  the  rule  PropEqL  as  a  representative  example.  In  the  proof  for  PropEqL  for 
entailment  we  showed  that  given 

rh{Sa*  {y\e/x\  A  x  =  e)  j  k  ►iVar  k  (5.10) 

we  can  derive  the  following  by  application  of  the  Strengthening  rule  from  Figure  4.1. 

r  h  {£a  *  (ip  A  x  =  e)}  k  ►iVar  k 

For  entailment,  the  inductive  hypothesis  and  our  goal  were  both  implications  and  (5.10) 
was  the  conclusion  of  the  inductive  hypothesis.  In  the  soundness  theorem  for  frame  infer¬ 
ence,  we  get  (5.10)  directly  from  the  inductive  hypothesis.  Once  (5.10)  is  obtained,  further 
reasoning  is  the  same.  We  apply  Strengthening  with  the  implication  below. 

(sa  *  {ip  A  x  =  e)  j  =>■  (eo  *  {ip[e/x\  A  x  =  e)  j 

We  now  consider  the  rules  in  Figure  5.11. 
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RightPure 

n  =>  3x.  IT  fk(3x.  (Ea  *  E)  A  IT)  =  Some (r,  k) 

Ea  [  E  A  II  =A  fk  3x.  emp  A  II'  /  T  h  k 

s 

DefL 

(d(v)  <=>  ...|C'i(P)|  ...)  GS 

Cl(e)  =  (il,  :  let  satisfy  ft'  in  ipfj 

Vi.  (E0  D  (ip  *  ifi )  A  II?  A  n'  ==>fk  <p'  //Ti\-  % ) 

- - - - - Vi.  0  fv (ip,  E0,  n») 

sa  D  V  *  d(e)  =^>fk  <p  // 

|J(ri)  h  branch  . . . ,  11,;  =>•  zj  :=  ?;  assume(Il');  fcj, . . .  end 
i 

PtoMatchesAny 

Sa  =1=  (e  I-A  p)  n  V  =ffk  <pl  //  r  v 

Sa  D  (e  ^  p)  *  V  =g>f k  v'  *  (e  I-A  □)  //  r  h  A? 

Figure  5.11:  Rules  for  frame  inference  that  differ  from  those  for  entailment. 
RightPure  We  are  given  II  =>•  3x.  II'  from  the  first  premise  and 

r  V  {(£a  *  £)  A  n'}  k  ►  iVar  k 

from  our  requirement  that  fk  produce  valid  instrumentations  of  k.  We  then  must  show  the 
following. 

r  V  {(£a  *  £)  A  n }  k  ►iVar  k 

This  follows  from  our  assumption  on  k  by  the  Strengthening  rule  from  Figure  4.1  to¬ 
gether  with  the  implication  below. 

(£0  *  E)  A  II  4  3x.  (£0  *  £)  A  II' 

The  implication  holds  since  II  =>  3x.  II'  implies  the  following. 

(Ea*E)An^  (£a  *  £)  A  (3f.  n') 

The  scope  of  the  existential  on  x  can  then  be  extended,  as  3x.  II'  can  always  be  alpha- 
varied  such  that  x  IT  /u(£a,  £)  =  0. 
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PtoMatchesAny  This  case  follows  the  same  reasoning  as  for  PtoMatches,  as  the 
only  difference  in  the  rules  involves  the  formula  on  the  right-hand  side  of  the  sequent 
arrow,  which  does  not  participate  in  the  statement  of  this  theorem. 

DefL  We  have  the  following  from  our  inductive  hypothesis  applied  to  each  premise 

(sa  o  (<p  *  w)  a  nt  a  n'  =>fk  ip'  //  Vi  h  %) . 

r  h  {Ea  *  ((93  *  (Pi)  A  Ilj  A  II')}  k  ►rwir  k 

We  then  follow  the  same  reasoning  as  in  the  proof  for  our  entailment  system  (Theorem 
28),  generating  the  following  result  for  each  premise. 

1“  {(<£  *  (32*j.  ^All')  *  E0)  A  Ilj} 

zt  :  =  ?;  assume(II-) ;  hi  ►iVar  k 

Note  that  each  assumption  now  has  a  precondition  of  the  form  below 


(ip  *  (3 Zi.  <Pi  A  n')  *  E0)  A  n,  (5.11) 

Our  goal  is  to  show  that  the  following  holds,  where  kb  is  the  branch  in  the  instrumented 
continuation  in  the  conclusion  of  the  DefL  rule  (which  has  the  form  branch  . . .  end). 

T  b  {(v*d{e))  *Ea}  kb  ►iVar  k 

As  with  entailment,  we  note  that  the  precondition  in  the  formula  above  is  equivalent  to  the 
following. 


By  commuting  and  re-associating  terms,  we  can  rewrite  this  such  that  it  is  equal  to  equation 
(5.1 1)  for  each  i.  In  entailment,  we  then  had  that  the  soundness  of  the  branch  that  we  add 
follows  from  an  n-ary  version  of  the  derived  rule  below. 

Q  (Q 1  A  ei)  V  ( Q2  A  62)  r  b  {Qi  A  ei}  k\  ►y  k  T  b  {Q2  A  62}  ('2  ►y  k 
T  b  {Q}  branch  e\  =>•  ki,  e2  =>•  k2  end  ►y  k 
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This  was  the  extent  of  the  proof  for  this  case  in  Theorem  28.  For  frame  inference,  one 
more  step  is  necessary.  We  have  to  address  the  fact  that  the  statements  of  valid  instrumen¬ 
tation  for  our  premises  do  not  involve  the  same  context.  For  this  reason,  we  need  the  rule 
below. 

Q  =>•  (Qi  A  ei)  V  (Q2  A  e2) 

Ti  F  {Q i  A  ei}  k\  ►y  k  T2  F  {Q2  A  e2}  k2  ►y  k 

- — - — - Inst-Branch' 

Ti  U  r2  F  {Q}  branch  e\  =>■  k\,  e2  =>■  k2  end  ►y  k 

This  can  be  derived  from  the  previous  rule  (where  the  contexts  were  required  to  be  the 
same)  by  making  use  of  Lemma  12.  Recall  that  (T  U  r') (Z)  =  F(l)  V  r'(/)3.  Since 
r(Z)  =>•  T(/)  V  T'(l)  and  r'(/)  =4>  T(/)  V  r'(/)  we  can  unify  the  contexts  present  in  the 
premises  of  our  desired  inference  rule  above,  obtaining  the  following  derivation,  which 
establishes  this  as  a  valid  derived  rule  and  completes  the  proof  of  soundness  for  this  case. 

r i  F  {Q1  A  ei}  ki  ►y  k  T1  F  {Q\  A  ei}  k\  ►y  k 

Lem.  12 - — -  Lem.  12 - — - 

rx  ur2  F  {Qi  A  ei}  ki  ►y  k  T^  U  T2  F  {Q\  /\  ef)  k\  k 

A 

Q  =>  (Qi  A  ei)  V  (Q2  A  e2) 

- - - — - Inst-Branch 

Ti  U  T2  F  {Q}  branch  e\  =>■  k\,  e2  =>■  k2  end  ►y  k 

Proper  Form  We  now  show  the  proof  for  Theorem  30,  which  states  that  fk  is  only  called 
with  inputs  of  the  appropriate  form.  The  proof  is  by  induction  on  the  derivation  of 

For  rules  where  </?'  and  £a  are  identical  in  the  premise  and  conclusion  of  the  rule,  our 
result  follows  immediately  from  the  inductive  hypothesis.  This  includes  rules  PropEqL, 
NotNull,  Disjoint,  InstL,  ExistsL,  and  DefL.  For  LeftPureFalse  there  is  nothing  to 
prove,  as  fk  is  not  called  in  the  derivation  (this  rule  is  an  axiom  that  does  not  call  fk). 

technically,  contexts  in  this  chapter  map  locations  to  sets  of  symbolic  state  formulas,  whereas  the 
contexts  in  Chapter  4  mapped  locations  to  separation  logic  formulas.  However,  since  we  are  interpreting 
sets  of  symbolic  state  formulas  disjunctively,  the  equality  given  here  in  terms  of  formulas  holds. 
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We  now  consider  each  of  the  other  rules. 

PtoMatches  We  have  from  our  inductive  hypothesis  that  is  only  called  with  inputs 
of  the  form 

(<Pn  *  (sa  *  (e  ^  p))) 

for  some  p"  such  that  p”  =>  p'  *  true.  We  must  show  that  fk  is  only  called  with  inputs  of 
the  form  (p’"  *  £0)  such  that  p'"  =>■  ( p ’  *  (e  i -A  p)  *  true) .  We  let  p'"  =  p"  *  (e  ha  p).  To 
complete  the  proof,  we  must  show  (p"  *  (e  H »  p))  =>  [p1  *  (e  ha  p)  *  true) .  This  follows 
directly  from  our  assumption  p"  p'  *  true  and  the  fact  that,  in  separation  logic,  if  p  g 
is  valid,  then  so  is  p  *  r  =>  q  *  r. 

PredMatches  The  proof  for  this  case  is  the  same  as  for  PtoMatches,  but  with  d{e) 
substituted  for  e  H »  p. 

PtoMatchesAny  We  have  from  our  inductive  hypothesis  that  fk  is  only  called  with 
inputs  of  the  form 

(p"  *  (£a  *  (e  ^  p))) 

for  some  p"  such  that  p"  p'  *  true.  We  must  show  that  fk  is  only  called  with  inputs  of 
the  form  ( p *  £a)  such  that  p’"  {ip'  *  (e  4  □)  *  true).  We  let  p'"  =  p"  *(e4  p). 
To  complete  the  proof,  we  must  then  show  [p"  *  (e  ha  p))  =>■  [p'  *  (e  ha  □)  *  true). 
This  follows  directly  from  our  assumption  p"  p'  *  true  and  the  fact  that  e  ha  p  implies 

e  ha  □. 

ExistsR  We  have  from  our  inductive  hypothesis  that  fk  is  only  called  with  inputs  of 
the  form 

(p"  *  Sa) 

for  some  p"  such  that  p"  =A-  p'[e/x\  *  true.  We  must  show  that  fk  is  only  called  with 
inputs  of  the  form  p"'  *  £  such  that  p'"  =>■  (3x.  p'  *  true) .  We  let  p"'  =  p" .  Because 
p'[e/x]  =>■  3x.  p'  we  then  have  p"'  =>■  (=te.  p')  *  true  which  is  our  goal. 
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RightPure  This  is  the  only  axiom  that  calls  fk  and  thus  is  the  base  case  for  this  proof. 
The  argument  passed  to  fk  is  the  following 

3x.  (£ffi  *  S)  A  II' 

We  must  show  that  this  has  the  form  p”  *  £a  where  if”  =>■  (3x.  emp  A  II')  *  true.  We  let 
ip”  be  3x.  £  A  II'.  We  then  must  show 

(3x.  E  A  II')  4-  (3x.  emp  A  II')  *  true 

We  first  assume  (3x.  £  A  II').  From  this  and  the  tautology  £  =>■  true,  we  have  that 
3x.  true  A  II'  holds.  Since  true  AA  true  *  emp  we  have  3x.  (true  *  emp)  A  II'.  Since  II' 
is  pure  this  implies  3x.  true  *  (emp  A  II').  Applying  commutativity  of  *  and  moving  true 
outside  the  scope  of  the  existential  quantifier  then  gives  us  our  result. 

Usage  Example 

We  now  provide  an  example  designed  to  give  some  intuition  into  the  use  of  frame  inference 
in  the  construction  of  an  instrumentation. 

One  main  problem  that  we  are  introducing  frame  inference  to  address  is  the  failure  of 
post-conditions  to  match  up  with  preconditions  in  general.  Our  partialPost  function 
on  page  217  requires  the  preconditions  of  commands  that  access  a  heap  cell  at  x  to  explic¬ 
itly  contain  a  points-to  predicate  at  x.  Often,  the  precondition  does  not  have  this  form,  but 
can  be  shown  to  imply  one  which  does.  In  such  cases,  having  a  method  of  proving  this 
implication  allows  us  to  proceed  with  our  program  analysis. 

Suppose  we  are  instrumenting  continuation  k  which  is  equal  to  {x  :=  x. next) ;  k' . 
Further  assume  that  we  have  a  precondition  of  ls(n;x ,  nil)  Ai  ^  nil.  In  order  to  apply 
partialPost,  we  need  a  precondition  of  the  form  3y.  ((x  (->•  [p])  *  £)  A  II.  We 
can  then  construct  a  frame  inference  query  that  produces  an  instrumentation  starting  from 
ls{n\ x ,  nil)  A  x  nil  as  follows. 

Let  fk  be  the  function  below. 

fk  =  Asi.  instPost  (si,  x  :=  a;. next,  As2-  geninstCont  (0,  S2,  k'))) 
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Then  the  frame  inference  query  that  we  want  is  the  one  below. 

Zs(n;  x,  nil)  A  x  ni  i  =^fk  {x  ^  □)  //  r  h  k 

This  is  an  abbreviation  for  the  query  below,  which  initiates  a  proof  search  using  the  rules 
in  Figures  5.10  and  5.11. 

emp  []  Zs(n;  x,  nil)  A  x  ni  i  =^fk  (x  ^  □)  //  r  h  k 

5.5.4  exposeCellThenlnst 

The  function  exposeCellThenlnst  provides  the  interface  to  frame  inference  in 
our  implementation.  The  code  for  this  function  is  given  on  the  next  page.  The  call 

exposeCellThenlnst^,  x,  fk)  takes  the  following  arguments. 

p  A  symbolic  state  formula  that  gives  the  current  precondition. 
x  The  address  of  the  heap  cell  to  be  revealed. 

fk  The  instrumentation  generator  to  apply  to  the  formula  that  results  from 
showing  that  x  is  in  the  heap. 

If  exposeCellThenlnst  returns  Some(T,  k)  then  these  must  satisfy 

T  h  {93}  k  ►iVar  k 

This  function  issues  a  frame  inference  query  with  the  pattern  x  1— »  □  on  the  right  in 
order  to  expose  the  heap  cell  at  x.  The  sequent  ip  =>  fk  a:  H>  □  //  T  h  k  will  be  derivable 
only  if  x  can  be  shown  to  be  in  the  heap.  If  the  cell  at  x  is  indeed  exposed,  then  fk  will 
be  called  with  the  resulting  heap.  This  gives  us  a  method  of  converting  symbolic  state 
formulae  to  the  form  expected  by  the  partialPost  function  presented  on  page  217. 

The  soundness  result  for  frame  inference  tells  us  that  the  following  holds. 

T  b  {93}  k  ►  IVar  k 

which  is  exactly  what  is  required  for  exposeCellThenlnst  to  satisfy  its  specification 
from  Figure  5.7. 
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Function  exposeCellThenlnst  (p,x,  fk )  ■  Exposes  the  heap  cell  at  x  by  at¬ 
tempting  to  prove  an  implication  of  the  form  ip  =>■  (x  (->•  □)  *  p'  where  the  box 
represents  any  record  expression.  If  this  proof  succeeds,  then  the  instrumentation 
generator  fk  is  applied  to  the  formula  that  results. 

let/*  =  fk{(p)  in 

Search  for  proof  of  p  =>  p  x  ^  □  //  1  h  k.  (The  elements  T  and  k  are  returned 
by  the  proof  procedure  if  a  proof  is  found.  The  others  are  provided  as  inputs.) 
if  proof  is  found  and  proof  procedure  returns  T,  k  then 
return  Some(T,  k) 
else 

return  None 


5.6  Example 

We  now  pause  to  present  an  example  of  the  automated  analysis  we  have  developed  thus 
far.  We  will  consider  the  following  inductive  specification  of  a  singly  linked  list. 

ls(n;x,y )  <=> 

n  —  0  :  let  [  ]  satisfy  true  in  emp  A  x  =  y 
j  n  >  0  :  let  rf  satisfy  n  =  rf_  +  1  in 

3z.  (. x  (->•  [next  :  z\)  *  ls(nf;z,y ) 

And  analyze  the  following  program,  which  traverses  a  list  of  this  form. 

Li  :  (T)  branch  x  f  nil  =>■  (2)  x  :=  a;. next;  (3)  goto  Li, 

x  =  nil  (4)  halt  end 

We  will  let  ip0  =  ls(n ;  x,  nil)  and  T  =  {(Li,  p0)}  and  we  will  execute 

geninstCont  (T,  (T)) 

Since  we  have  not  yet  presented  definitions  of  abstract  and  branchAnnot,  we  will 
adopt  the  following  definitions  for  now,  which  trivially  satisfy  the  specifications  given  in 


253 


5  Instrumentation  Analysis 


Figure  5.7,  but  are  not  as  useful  as  those  we  present  later. 

abstract (99)  =  (</?,  e) 
branchAnnot  (</?,  [ei, . . . ,  en])  =  [true. ...,  true] 

The  first  construct  in  our  continuation  is  a  branch,  so  the  code  for  geninst  Cont  on  page 
210  calls 

branchAnnot (9?,  [x  ^  nil,x  =  nil]) 

This  returns  [true,  true].  Next,  the  function  calls  geninstCont  recursively  on  (2)  and 
(4).  The  call  to  geninstCont (T,  (p0  A  x  =  nil,  (4))  returns  Some(T,  halt) .  The  call  to 
geninstCont  (T,  g>0  A  i  ^  nil,  (2))  calls  instPost  in  order  to  process  x  :=  x.next. 
So  we  now  have  the  partial  instrumentation  given  below,  where  we  elide  portions  that  have 
not  been  generated  yet  and  write  the  precondition  at  that  point  in  braces.  We  also  write 
dark  circle  numbers  to  indicate  those  control  points  that  have  already  been  considered  by 
our  algorithm. 

L\  :  O  branch  x  ^  nil  assume(true) ;  {/s(n;  x,  nil)  A  x  ^  nil}(2) . . . ,  end 
x  =  nil  assume(true) ;  Q  halt 

The  instPost  function  notices  that  x  :=  x.next  is  in  A[x] — that  is,  it  is  a  command 
that  requires  a  memory  cell  at  x  to  be  present  in  the  heap.  Because  of  this,  it  calls  frame 
inference  to  derive  a  proof  of 

ls{n ;  x,  nil)  A  x  ^  ni  1  =^fk  mn/rhfc 

where  the  function  /*,  is  the  function  that  calls  partialPost  and  then  geninstCont 
on  the  post-condition  to  continue  processing.  Recall  that  the  above  is  an  abbreviation  for 
the  following  sequent. 

emp  []  ls(n ;  x,  nil)  A  x  ^  ni  I  =^fk  x^n//Thk 

The  first  step  of  frame  inference  applies  DefL,  obtaining  the  following  start  for  the 
proof  tree. 
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Clearly  the  sequents  involved  are  far  too  long  to  display  a  full  traditional  proof  tree 
here.  Instead,  we  will  present  an  abbreviated  tree  that  labels  each  node  with  the  inference 
rule  applied  at  that  point  and  also  records  the  arguments  used  in  any  calls  to  /.  We  will 
write  the  information  needed  to  reconstruct  the  full  rule  instance  to  the  side  of  the  rule 
name.  For  the  matching  rules,  this  will  be  the  formula  that  is  matched.  For  rules  that 
instantiate  variables,  this  will  be  the  substitution.  For  DefL,  this  will  be  the  predicate 
instance  expanded.  The  context  and  instrumented  continuation  that  are  returned  by  each 
rule  are  listed  below  it.  We  write  T i  and  k\  to  refer  to  the  context  and  continuation  returned 
by  the  first  (leftmost)  child  in  the  tree,  T2,  k-2  to  refer  to  the  second,  etc.  Figure  5.12  gives 
the  derivation  tree. 
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/(3a.  ( x  i->-  [next  :  a]  *  ls(n0;  a,  nil))  Ax/  nil) 

Thk 


RightPure 


LeftPureFalse 
h  assume(false) ;  halt 


PtoMatchesAny  (x  i-a  [next  :  a}) 


ExistsL  [a/z] 


r!ur2  f 


DefL  ( ls(n ;  x,  nil)) 

branch  n  =  0  =>  assume(true);  k\ , 

n  >  0  =>-  n0  :=  ?;  assume(n  =  n0  +  1);  A;2  end 


Figure  5.12:  Proof  for  the  frame  inference  query 

ls(n ;  x,  nil)  A  x  /  ni  i  =»/fc  xAD//rhfc 

We  use  F i .  k]  to  refer  to  the  results  from  the  left  branch  and  r 2,  to  refer  to  the  result  from  the 
right  branch. 

Combining  this  with  what  we  had  before,  we  have  now  built  up  the  following  partial 
continuation. 

L\  :  Q  branch  x  /  nil  =>-  assume(true) ; 

branch  n  =  0  assume(true) ;  assume(false) ;  halt, 

n  >  0  =>■  n0  :=  ?;  assume(n  =  n0  +  1) ; 

{3a.  (x  i — y  [next  :  a]  *  /s(n0;  a,  nil)) 
Ax/  nil} 

(2). . .  end 

x  =  nil  =>■  assume(true) ;  Q  halt  end 
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We  now  execute  partialPost  to  find  the  post-condition  of  the  invariant  at  control 
location  (2),  reproduced  below 

3a.  (x  i — y  [next  :  a]  *  /s(n0;  a,  nil))  A  x  ^  nil 

with  respect  to  the  command  x  :=  a;. next.  This  results  in  the  formula  below. 

3a,  x' .  (V  i — y  [next  :  a]  *  ls(n0 ;  a,  nil))  A  x'  ^  nil  A  x  —  a 

If  we  perform  some  simplification,  we  obtain  the  formula  below. 

3x' .  (x'  i — y  [next  :  a;])  *  ls(n0]  x,  nil)  (5.12) 

The  next  command  encountered  is  the  goto  Li  command,  which  causes  geninst  Cont 
to  compare  the  current  state  against  the  invariants  that  have  been  collected  in  T.  The  only 
invariant  currently  in  T  and  associated  with  location  Li  is  the  following. 

/s(n;  x,  nil) 

This  is  not  implied  by  (5.12)  because,  while  we  can  match  /s(n0;  x ,  nil)  against  ls(n ;  x ,  nil) 
by  inserting  the  instrumentation  command  n  :=  n0,  we  cannot  match  the  portion  of  the 
heap  described  by  x'  i— >  [next  :  x\.  The  current  formula  thus  represents  states  not  sat¬ 
isfied  by  the  previous  formula  at  L\  and  geninstCont  indicates  that  we  should  apply 
abstract,  add  the  result  to  T,  and  then  continue  processing  from  this  new  state. 

Here  we  see  the  problem  with  the  simple  version  of  abstract  we  defined  earlier. 
With  abstract  defined  to  be  the  identity  function,  we  will  never  converge  on  a  finite  set 
of  invariants  associated  with  L\  that  describe  all  the  reachable  states  of  this  program. 

To  show  that  this  is  the  case,  we  list  the  next  two  invariants  that  the  analysis  will 
discover  associated  with  Ly. 

3x',  X2 .  ( x '  (->•  [next  :  x2])  *  ( x^  H »  [next  :  x])  *  ls(n2 ;  x,  nil) 

3x',X2,x3.  ( x '  i — y  [next  :  x2] )  *  (x2  i->-  [next  :  x3])  *  (x3  (->•  [next  :  x])  *  ls(n:i]  x,  nil) 

The  symbolic  state  formulae  that  we  generate  continue  to  contain  more  and  more  points-to 
predicates  that  are  not  part  of  the  list  from  a:  to  nil. 
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This  highlights  the  importance  of  the  abstract  function.  Without  it,  the  algorithm 
does  not  terminate.  But  with  a  well-chosen  abstract,  as  we  will  see  in  the  next  section, 
the  algorithm  is  able  to  converge  on  fixed-points  for  many  programs. 


5.7  Abstraction 

The  final  component  necessary  before  we  can  present  a  full  example  run  of  the  algorithm, 
is  the  framework  for  performing  abstraction.  This  is  similar  to  the  summarization  step 
in  TVLA  Sagiv  et  al.  [2002]  and  corresponds  to  the  abstraction  function  used  in  abstract 
interpretation  Cousot  and  Cousot  [1977]. 

The  motivation  for  abstraction  is  that  if  we  only  perform  post-condition  computation 
and  unroll  inductive  predicates  on  the  left,  we  will  never  converge  on  a  finite  set  of  invari¬ 
ants,  as  we  saw  in  the  previous  section.  Abstraction  solves  this  problem  by  occasionally 
intentionally  forgetting  information  about  our  current  symbolic  state  formula  in  order  to 
allow  it  to  cover  more  concrete  states.  The  term  abstraction  refers  to  the  fact  that  this 
operation  results  in  a  more  abstract  (weaker)  formula. 

To  give  a  simple  example,  consider  one  of  the  states  we  generated  when  looking  at  the 
example  in  the  previous  section. 


3x'.  ( x 1  i — ^  [next  :  x])  *  ls(n0:  x,  nil) 


The  formula  x'  (->•  [next  :  x]  describes  a  list  segment  of  length  one.  That  is,  every 
concrete  stack  and  heap  pair  which  satisfy  x'  (->•  [next  :  a:]  also  satisfy  ls(l;x',x). 
We  are  thus  free  to  apply  Strengthening  to  switch  the  current  state  formula  from 
Eta',  (x1  i-)-  [next  :  x])  *  ls(n0;  x,  nil)  to  3x'.  ls(l]x',x )  *  (s(n0;  x,  nil)  before  storing 
the  state  in  T.  This  is  what  abstract  will  do — return  a  different  formula  that  is  implied 
by  the  formula  supplied  as  input. 
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The  transformation  just  described  is  not  enough,  however,  to  cause  the  analysis  to 
terminate.  We  will  simply  obtain  the  sequence  of  states 

3x'.  ls(  1;  x',  x )  *  /s(n0;  x,  nil) 

3x' .  ls( 2;  x\  x)  *  ls(n0',  x,  nil) 

3x'.  ls( 3;  x',  x)  *  ls[n0 ;  x,  nil) 


We  need  to  forget  the  length  as  well  before  we  can  obtain  a  formula  weak  enough  to 
describe  all  reachable  states.  One  way  to  do  this  would  be  to  existentially  quantify  the 
length,  obtaining  the  invariant 

3 n,  x' .  ls{n ;  x',  x)  *  is(n0;  x,  nil) 

However,  we  can  also  use  an  instrumentation  variable  to  capture  the  fact  that  the  length 
is  changing.  This  provides  a  more  precise  abstraction,  as  we  will  record  instrumentation 
commands  describing  exactly  how  the  changes  to  the  length  occur  (in  this  case,  we  will 
record  that  the  length  of  this  segment  increases  by  one  each  time  we  reach  Lx). 

Because  we  must  describe  exactly  how  an  instrumentation  variable  is  updated,  this 

method  requires  more  care  than  the  use  of  an  existential  variable.  However,  as  we  will  see, 

all  the  information  we  need  is  already  present  in  the  form  of  our  inductive  specifications. 

5.7.1  Abstraction  Patterns 

We  will  derive  formulae  termed  abstraction  patterns  from  the  cases  of  our  inductive  spec¬ 
ifications.  These  describe  exactly  how  to  replace  some  portion  of  the  state  formula  with 
an  instance  of  an  inductively  specified  predicate. 

We  will  again  take  the  singly-linked  list  specification  as  our  example. 

ls(n;x,y )  <=> 

n  —  0  :  let  [  ]  satisfy  true  in  emp  A  x  =  y 

I  H  >  0  :  let  v!_  satisfy  n  =  n7  +  1  in 

3 z.  (x  (->•  [next  :  z\)  *  ls(nf;z,y ) 


259 


5  Instrumentation  Analysis 


We  first  consider  the  n  >  0  case.  Reading  the  equivalence  from  right  to  left,  this  states 
that  if  the  heap  contains  x  [next  :  z]  for  some  z  and  separately  contains  /s(n';  z,  y )  for 
the  same  z,  then  this  can  be  viewed  as  ls(n:  x,  y )  for  some  n  such  that  n  =  n7  +  1.  This 
allows  us  to  replace  (x  i->-  [next  :  z])  *  /s(n';  z,  y)  with  ls(n ;  x,  y)  provided  we  also  update 
the  instrumentation  variables  appropriately.  The  main  issue  in  terms  of  implementation  of 
such  a  replacement  method  is  how  to  perform  the  initial  matching.  That  is,  how  do  we 
determine  the  instantiation  of  bound  variables  in  the  inductive  specification  that  results  in 
an  applicable  instance  of  the  rule.  Our  matching  will  be  guided  by  the  spatial  formulae 
present  in  the  specification  and  in  the  current  state. 

For  the  example  of  the  non-empty  case  of  the  singly-linked  list  predicate,  we  want  to 
search  for  a  sub-formula  of  the  current  state — call  it  p — that  has  the  form  below. 

(ei  (->•  [next  :  e2])  *  /s(e4;  e2,  e3) 

Once  we  have  found  such  a  sub-formula,  we  can  replace  it  with  ls(n ;  e±,  e:i)  provided  that 
the  following  pattern  condition  holds 

p  =>■  3  n.  n  =  e4  +  lAn>0 

The  reason  for  this  check  is  that  we  could  have  a  predicate  such  as  the  one  below,  which 
describes  lists  of  length  less  than  5. 

ls(n]x,y)  <=> 

n  —  0  :  let  [  ]  satisfy  true  in  emp  A  x  =  y 
|  n  >  0  A  n  <  5  :  let  n'  satisfy  n  =  n'  +  1  in 
3 z.  (x  i — ^  [next  :  z])  *  ls(n';  z,  y) 

Such  a  specification  cannot  always  be  applied  right-to-left  even  if  the  spatial  portion  of 
one  of  the  cases  can  be  matched.  In  practice,  we  have  never  needed  to  work  with  such  a 
specification.  All  the  specifications  we  have  written  while  running  our  experiments  have 
the  property  that  the  check  above  is  always  true.  We  will  state  the  theory  in  terms  of  the 
general  case,  which  requires  this  check.  But  it  is  useful  to  avoid  it  whenever  possible  in 
the  implementation,  as  proving  pure  implications  involving  existential  quantification  on 
the  right  can  be  a  slow  process  for  many  theorem  provers. 
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We  now  consider  the  general  case.  Recall  that  a  case  of  a  specification  has  the  form 
below 

n  :  let  f  satisfy  IT  in  3:A .  E  A  II" 

and  is  abbreviated  as  (7(5?;  y),  where  x  is  the  list  of  instrumentation  parameters  for  the 
definition  and  y  is  the  list  of  non-instrumentation  parameters.  The  meaning  of  this  case  as 
a  separation  logic  formula  is  the  following 

n  a  3 z.  (n7  a  p ) 


which  we  write  \C(gr,y)~ |. 

When  matching  such  a  case  against  a  symbolic  state,  most  of  the  variables  will  be 
interpreted  existentially,  as  they  were  in  our  example  above.  To  see  why,  consider  the 
reasoning  process  we  are  trying  to  establish  in  executing  this  replacement.  For  some  case 
(7(5?;  y)  of  an  inductive  predicate  d(x:  y),  and  some  symbolic  state  formula  p,  we  want  to 
show  the  following. 


<p=>(<p'*  rC'(£;e^)l)  =>  {iff  *d(e1-,e2))  (5.13) 

In  the  first  implication,  C  (A| ;  el)  appears  on  the  right,  so  we  get  to  choose  terms  not  just  for 
the  parameters,  but  also  for  any  existentially  quantified  variables  in  the  body  of  the  case. 
This  includes  x\  and  also  z,  as  these  appear  existentially  quantified  in  the  representation 
of  the  case  as  a  separation  logic  formula. 

Though  these  variables  are  all  existential  in  nature,  they  do  serve  different  roles,  moti¬ 
vated  by  our  desire  to  use  this  rewriting  process  to  produce  formulae  that  are  more  likely 
to  be  invariants  across  multiple  iterations  of  loops.  As  we  saw  with  the  list  example,  where 
we  obtained  a  list  of  length  1,  then  length  2,  then  3,  etc.,  the  instrumentation  parameters  x 
can  interfere  with  the  discovery  of  a  loop  invariant.  Furthermore,  it  is  difficult  to  find  the 
list  of  expressions  Aj  that  witness  the  validity  of  the  implication  in  (5.13),  as  A]  may  be  an 
arithmetic  expression  not  occurring  in  p. 

To  remedy  both  these  issues,  we  instead  use  the  following  line  of  reasoning. 

p  =>  3xj.  (p’  *  \C(x j;  el)~|)  =>  3Tj.  (p’  *  d(x j;  el)) 
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We  then  insert  the  instrumentation  command  x}  :=  ?  to  eliminate  the  existential  on  xx. 
As  we  will  see  when  we  present  the  details,  we  also  want  to  record  at  this  point  some 
assumption  linking  xx  to  other  instrumentation  variables.  Following  this  line  of  reasoning 
ensures  that  the  symbolic  state  formulae  generated  by  abstraction  always  contain  variables 
in  the  instrumentation  parameter  positions.  This  will  make  it  easier  to  use  the  InstL  rule  in 
our  frame  inference  system  to  find  instrumentation  commands  that  allow  us  to  re-establish 
a  previously  discovered  invariant. 

Another  issue  we  must  take  care  to  avoid  is  the  production  of  a  formula  that  is  too 
weak  to  be  useful  in  further  analysis  of  the  program.  To  see  an  example  of  this,  consider 
the  invariant  we  obtained  at  Lx  after  a  single  pass  of  analysis  of  our  example  list  traversal 
program.  We  had 

Eta',  {pc'  i — ^  [next  :  x]  *  ls{n0 ;  x,  nil)) 

We  noted  previously  that  this  formula  implies 

3x'.  ls{  1;  x ' ,  x)  *  ls{n0 ;  x,  nil) 

However  it  is  also  implies 

3x'.  ls{n0  +  1,  x',  nil) 

But  pushing  this  formula  through  the  analysis  will  quickly  lead  us  to  trouble.  The  formula 
does  not  say  anything  about  x,  and  so  when  we  next  try  to  execute  x  :  =  a;. next  we  are 
unable  to  show  that  x  exists  in  the  heap. 

The  reason  we  lost  track  of  x  is  that  we  matched  a;  to  a  variable  that  did  not  occur  in 
the  parameter  list  of  the  predicate.  When  we  replace  some  piece  of  the  formula  represent¬ 
ing  the  body  of  a  case  with  an  instance  of  an  inductive  predicate,  we  only  retain  spatial 
information  about  expressions  occurring  as  parameters  of  that  definition.  In  [Magill  et  al., 
2006]  we  introduced  a  condition  on  abstraction  rewrites  that  avoids  this  case.  If  we  want 
to  replace  a  piece  of  heap  with  an  inductive  predicate  instance  using  a  case  of  the  form 
below 

II  :  let  z_  satisfy  IT  in  3aq.  E  A  11" 

the  expressions  corresponding  to  x[  must  not  contain  program  variables.  Distefano  et  al. 
[2006]  present  a  stronger  condition  that  also  requires  that  variables  in  the  expressions  cor- 
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responding  to  x\  must  not  appear  elsewhere  in  the  spatial  portion  of  the  state.  This  stronger 
condition  is  important  in  more  complicated  sharing  patterns.  Consider  the  symbolic  state 
below. 

3z.  Is^n^x,  z )  *  ls(n2 ;  y,  z)  *  ls(n3;  z,  nil) 

Suppose  we  had  a  specification  like  the  one  below 

ls(n',x,y)  <=> 

true  :  let  n1 ,  n2  satisfy  n  =  n1+n2  in 
3 z.  /s(n1;  x,  z)  *  ls(n2,  z,  y) 

The  weaker  condition  would  then  allow  us  to  replace  ls(n2;y,z )  *  ls(n3,  z,  nil)  with 
ls(n2  +  Zf3;  y-,  nil)  obtaining 


3 z.  ls(ni,  x,  z)  *  ls(n2  +  n3;  y,  nil) 

This  formula  loses  the  information  about  x  and  y  eventually  reaching  the  same  heap  cell. 
This  does  not  affect  soundness,  but  would  cause  problems  when,  for  example,  traversing 
the  list  at  x,  as  we  would  be  unable  to  show  memory  safety  beyond  the  point  where  x 
reaches  z.  The  stronger  condition  would  prevent  us  from  combining  these  lists  since  z, 
the  variable  that  is  disappearing,  occurs  in  /s(nx;  x,  z),  which  does  not  participate  in  the 
replacement.  We  use  the  stronger  condition  in  the  presentation  here  and  in  our  implemen¬ 
tation. 

Now  that  the  motivation  for  the  various  checks  is  clear,  we  will  present  the  general 
form  of  an  abstraction  pattern.  The  pattern  will  have  the  format  below. 

(s)  -p!r>  (s0  m 

The  variables  in  v  can  be  instantiated  with  expressions  when  matching  the  pattern.  The 
formula  E  gives  the  spatial  formula  that  should  be  matched.  The  formula  II  gives  the  pat¬ 
tern  condition  that  must  hold  for  the  rewrite  to  be  applicable.  The  variables  x  are  the  new 
instrumentation  variables  that  will  be  introduced,  and  the  formula  II'  gives  the  relationship 
between  the  new  instrumentation  variables  and  the  old  instrumentation  variables  present 
in  E.  The  formula  S'  is  the  replacement  for  the  spatial  formula  E.  The  variables  v  and  x 
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are  considered  bound.  We  derive  such  a  pattern  from  a  case  of  an  inductive  specification 
as  follows. 

Definition  34.  Let  C(x ;  y)  be  a  case  of  an  inductive  specification  of  predicate  d  and  sup¬ 
pose  C(x:  y)  has  the  following  form,  where  the  variables  z,x \,x,  and  y  are  all  distinct. 

II  :  let  z_  satisfy  IT  in  3x .  E  A  II" 

Then  the  abstraction  pattern  associated  with  C  (x;  y  )  is 

[xi,z,y\  (E)  (, d(x-,y ))  [x] 


We  expect  patterns  to  obey  the  following  soundness  criterion. 


Definition  35.  A  pattern  [x]  (E) 
y  D  fv(Ti)  =  0,  and 


n 

—  PAT — > 
U' 


(S')  [g]  is  sound  iff  x  and  y  are  all  distinct, 


Vf.  E  A  (3 y.  n)  =>  3 y.  S'  A  n' 


We  then  have  the  following  theorem  regarding  our  method  for  translating  cases  to 
patterns. 

Theorem  31.  The  method  given  as  Definition  34  for  converting  a  case  of  an  inductive 
specification  to  an  abstraction  pattern  is  sound. 


Proof.  The  condition  on  distinction  of  the  variables  and  the  new  instrumentation  variables 
being  not  free  in  E  follow  from  the  same  conditions  on  the  syntax  of  our  inductive  speci¬ 
fications  (see  Figure  5.3). 

For  the  main  soundness  condition,  recall  that  an  inductive  specification 

d(x]  y)  =  Ci  |  . . .  |  Cn 
is  interpreted  as  the  separation  logic  formula 

Vx,  y.  d(x,  y)  [Ci]  V  . . .  V  \Crf\ 
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This  implies 

\/x,y.  [Ci]  =>  d{x,y) 

And  this  is  the  formula  on  which  we  will  base  the  soundness  argument. 

Instantiating  this  with  the  particular  C,  from  Definition  34  we  obtain 

Vx,  y.  (n  A  3 15.  IT  A  3x[.  E  A  II")  =>•  d(x,  y) 

The  restrictions  on  fv(U)  and  fv(U')  in  Figure  5.3  on  page  193  give  us  that  5  D  fv(Jl)  =  0 
and  X\  fl  fv (II.  II')  =  0.  This  lets  us  rewrite  the  above  as 

Vx,  y.  (35,  x\.  II  A  II'  A  (E  A  II"))  =>•  d(x,  y)  (5.14) 

This  implication  is  available  for  use  since  it  follows  from  one  of  the  inductive  specifi¬ 
cations  and  all  reasoning  is  done  under  the  assumption  that  the  inductive  specifications 
hold. 

To  show  soundness  of  the  abstraction  pattern,  we  must  show  the  following. 

Vxi,  5,  y .  E  A  (3x.  II  A  IT  A  II")  3x.  d(x;  y)  A  II7 

We  consider  some  arbitrary  x[ ,5,  y  and  assume  E  A  (3x.  II  All' All").  Since  xfl/u(E)  =  0 
we  can  move  the  quantifier  on  x  to  the  outside,  obtaining 

3x.  s  a  (n  a  n'  a  n") 

Eliminating  the  existential  quantifier  on  x  and  applying  (5.14),  then  gives  us. 

d(x,y) 

We  already  have  IT',  so  we  can  obtain 

d(x,  y)  A  IT 

Then  we  re-introduce  the  existential  quantifier  on  x,  obtaining 

3x.  d(x,  y)  A  Fh 

which  is  our  goal.  □ 
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5.7.2  Empty  Patterns 

In  the  discussion  above,  we  concentrated  on  patterns  that  arose  from  the  non-empty  cases 
of  our  inductive  specifications.  Patterns  based  on  empty  cases  pose  a  problem  for  automa¬ 
tion  because  the  spatial  formula  emp  can  be  found  in  any  symbolic  state.  Thus,  patterns 
derived  from  empty  cases  would  always  be  applicable.  As  a  result,  we  do  not  generate 
patterns  from  empty  cases.  However,  we  need  to  include  some  sort  of  pattern  derived 
from  the  base  case  or  we  will  never  be  able  to  introduce  instances  of  inductive  predicates. 
Consider  a  routine  that  creates  a  linked  list.  We  will  get  states  like  the  following 

x  1 — y  [next  :  nil] 

3x\.  x  (->•  [next  :  Xi]  [next  :  nil] 

=hi,  X2-  x  i — ^  [next  :  xfi\  *  X2  1— »  [next  :  x{\  *  X\  1— »  [next  :  nil] 

and  with  no  way  to  introduce  an  instance  of  the  list  predicate,  we  will  never  find  a  finite 
description  of  all  these  states. 

One  solution  is  to  have  the  user  provide  a  creation  pattern  for  each  data  structure.  For 
example,  for  a  linked  list,  they  could  provide 

[x,y\  (x  1 — ^  [next  :  y])  -pat-)-  (ls(k]  x,y))  [fc] 

k=  1 

However  such  patterns  can  also  be  generated  automatically  by  expanding  inductive  predi¬ 
cates  repeatedly.  For  example,  suppose  we  take  the  doubly-linked  list  definition  below. 

dll (h,  p,  first,  last,  n)  <=> 

k  —  0  :  let  []  satisfy  true  in  emp  A  first  =  n  A  last  =  p 
|  k  >  0  :  let  A/  satisfy  k  =  kS  +  1  in 

z \z.  ( first  i-A  [prev  :  p,  next  :  z\)  *  dll ( k? ;  first,  z,  last,  n )) 

We  can  expand  the  predicate  dll (fc;  a,  b,  c,  d )  once  using  the  non-empty  case,  obtaining 

k  >  0  A  3/d.  k  =  U  +  1A 

3 z.  (b  t-A  [prev  :  a,  next  :  z\)  *  dll(fc';  b,  z,  c,  d ) 
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and  then  expand  dll(/c';  b,  z,  c,  d)  using  the  empty  case,  obtaining 

k  >  0  A  3k'.  k  =  kf  +  1A 

3z.  ( b  i — y  [prev  :  a,  next  :  z]) 

A  (fc'  =  0  A  ft  =  c  A  z  =  d) 

We  now  have  a  description  of  a  list  segment  that  contains  no  inductive  instances  of  the  dll 
predicate  but  describes  a  non-empty  heap.  We  can  translate  this  into  the  following  creation 
pattern. 

r  7  7  1/7  r  i\  (k=k'+l)/\{k'=OAb=c/\z=d)  ,  r,  ,  n 

a,  o,  c,  a,  z  (o  (->•  prev  :  a,  next  :  z  )  — pat->  ( dink;  a,  o,  c,  a)  )\k,k\ 

V  ’  (fc=fc'+l)A(fc'=0)  v  ’ 

Now  suppose  we  are  faced  with  a  state  such  as  the  following. 

x  t-4  [prev  :  nil.  next  :  y] 

We  can  apply  the  pattern  above  by  using  the  substitution  a  — >  nil,  b  — >  x,  c  — >  x,  d  — >  y,  z  — >  y. 
To  make  the  pattern  more  useful  for  automation,  it  helps  to  eliminate  the  variable  z  and 
propagate  the  equality  b  =  c.  Propagating  the  equality  A;'  =  0  is  also  helpful  as  this  re¬ 
sults  in  fewer  instrumentation  variables.  Applying  these  simplifications  leaves  us  with  the 
pattern  below. 

k=l 

[a,  6,  d]  (b  i-4  [prev  :  a,  next  :  d])  — |AT_i>  (dll(fc;  a,  b ,  b ,  d))  [k\ 

The  pattern  condition  in  this  case  is  equivalent  to  true  (soundness  for  abstraction  patterns 
states  that  3k.  k  —  1  must  hold  in  this  case,  but  this  is  a  tautology).  This  enables  us  to 
simplify  the  pattern  even  further. 

[a,  b,  d]  (b  h4  [prev  :  a,  next  :  d])  — pat->  (dll(A;;  a,  b,  b,  d))  [fc] 

Our  implementation  attempts  to  discover  when  pattern  conditions  are  tautologies  and  ap¬ 
ply  this  simplification,  as  avoiding  the  theorem  proving  call  associated  with  checking  the 
pattern  condition  each  time  the  pattern  is  applied  significantly  decreases  execution  time. 
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5.7.3  Applying  Abstraction  Patterns 


Now  that  we  have  shown  how  to  derive  abstraction  patterns  from  inductive  predicate  spec¬ 
ifications,  we  will  show  how  these  patterns  are  used  to  abstract  a  symbolic  state  formula. 
In  Figure  5.13  we  define  a  relation  with  syntax  p  — abs->  (p'  j  c).  This  relation  takes  a 
symbolic  state  formula  p  to  a  pair  consisting  of  a  weaker  formula  <p'  and  c,  the  sequence 
of  instrumentation  commands  necessary  to  generate  p'  from  p  (the  empty  command  list  e 
is  used  if  p'  follows  from  p  by  Strengthening).  The  rules  are  parametrized  by  the  set  of 
abstraction  patterns  A.  Note  that  the  side  condition  of  the  first  rule  can  always  be  satisfied 
by  renaming  bound  variables,  as  the  variables  y  are  bound  in  the  abstraction  pattern.  We 
show  on  page  275  the  code  for  abstract,  which  uses  the  relation  just  described. 

The  formal  specification  of 

P  —  ABS->-  (<£>'  |  c) 

is  that  this  should  hold  only  if  for  all  T,  /c,  k, 

T  b  {p'}  k  ►iVar  k 


implies 

T  b  {p}  (C  °9k)  ►  IVar  k 


First  Rule 

The  first  rule  in  Figure  5.13  has  a  number  of  premises.  We  go  through  them  each  here, 
explaining  their  function.  First  we  present  a  guide  to  the  notation  in  the  figure,  using  a 
linked  list  example.  Below  is  an  abstraction  pattern  that  replaces  two  list-structured  heap 
cells  with  an  instance  of  the  list  predicate. 

[x,y,z\  (a:  (->•  [next  :  y\  *  y  H >  [next  :  z])  — pat->  (ls(k]  x,  z))  [/c] 

We  will  show  how  to  apply  this  pattern  to  the  symbolic  state  below  (and  several  variations 
on  this  state). 

po  =  3b.  a  (->•  [next  :  b\*b  i-»  [next  :  nil]  *cb  [next  :  6]  A  g  >  0 
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We  now  describe  each  meta- variable  present  in  the  first  rule  in  Figure  5.13. 

p  The  symbolic  state  formula  that  is  being  abstracted.  For  our  example,  this  is 
( p0 ,  defined  above. 

E  The  left-hand  side  of  the  rewrite  rule.  Specifies  the  pattern  to  search  p  for.  In 
our  example,  this  is  a;  1 — ^  [next  :  y\  *  y  ^  [next  :  z], 

S'  The  right-hand  side  of  the  rewrite  rule.  Specifies  the  replacement  for  E.  In  our 
example,  this  is  ls(k;  x ,  z). 

x  The  list  of  variables  in  the  pattern  that  can  be  instantiated  to  expressions.  In 
our  example  this  is  x,  y,  z.  This  can  also  include  instrumentation  variables  if 
these  are  available  for  replacement. 

0  The  substitution  that  makes  some  portion  of  9?  match  E.  Its  domain  is  x.  In  our 
example,  this  substitution  will  be  x  — >  a,  y  — >  b,  z  — >  nil  (other  matchings  are 
also  possible — the  abstraction  process  is  non-deterministic  and  any  matching 
pattern  can  be  chosen  and  applied  without  affecting  soundness). 

E0  The  spatial  portion  of  99  not  matched  by  the  pattern.  This  is  c  H »  [next  :  6]  in 
our  example. 

n0  This  is  the  pure  portion  of  p .  In  our  example  this  is  g  >  0. 

xq  The  list  of  quantified  variables  in  p.  In  our  example,  this  is  the  singleton  b. 

II  The  condition  that  must  hold  in  order  for  the  replacement  to  occur.  This  is  in 
addition  to  the  premises  on  free  variables  that  occur  as  preconditions  in  the 
first  abstraction  rule.  In  our  example,  this  is  true. 

y  The  list  of  new  instrumentation  variables  that  are  introduced  by  this  pattern. 
In  our  example,  this  is  k. 

IT  The  relation  between  instrumentation  variables  in  E  and  the  new  variables  y. 
In  our  example  this  is  k  =  2. 

We  now  discuss  each  premise  of  the  first  rule  in  Figure  5.13. 

condition!  (Jv  Ks))— >KS')))  cf0) 
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The  difference  fv(a( £))  —  fv(a(T,'))  gives  the  set  of  free  variables  that  disappear  from 
the  formula  when  applying  the  patten.  In  our  example,  the  difference  evaluates  to  b,  in¬ 
dicating  that  by  combining  a  (->•  [next  :  b]  *  b  (->•  [next  :  nil]  into  the  predicate  instance 
ls(k]  x ,  z),  we  lose  track  of  where  b  is  pointing.  The  C  x0  portion  of  this  check  ensures 
that  the  variables  that  are  disappearing  are  existentially  quantified.  We  want  to  avoid  hav¬ 
ing  non-quantified  variables  disappearing  as  these  correspond  to  program  variables,  which 
may  be  dereferenced  by  later  commands.  In  our  example,  this  check  passes,  since  b  is 
quantified. 

condition! (fv  (cr  (E))  -  fv(a( £')))  n  fv(E0)  =  0) 

This  condition  checks  that  the  variables  disappearing  do  not  appear  free  in  the  por¬ 
tion  of  p  that  is  not  participating  in  the  replacement.  In  our  example,  this  check  fails, 
since  b  occurs  in  the  predicate  c  (->•  [next  :  b].  We  want  to  avoid  losing  track  of  such 
shared  points  of  reference,  as  they  can  also  later  be  accessed  by  heap  commands.  Sup¬ 
pose  we  were  to  perform  our  example  replacement  in  spite  of  this  check  failing.  Then  we 
would  obtain  ls(k]x,  nil)  *c4  [next  :  b\.  In  such  a  state,  if  we  execute  the  commands 
v  :=  c.next;  v  :  =  i-.next  we  will  be  unable  to  show  that  the  second  heap  lookup  is  safe 
because  we  have  lost  track  of  the  fact  that  b  is  in  the  middle  of  the  two-element  list  at  x. 

In  order  to  allow  this  check  to  pass  and  continue  examining  the  other  conditions,  we 
will  change  our  example  state  to  the  following,  which  changes  the  value  of  the  next  field 
of  c  so  that  it  no  longer  points  into  the  list. 

(po  =  3b.  a  i — y  [next  :  b\  *  b  H »  [next  :  nil]  *c4  [next  :  nil]  A  g  >  0 
condition!  dom(a)  =  x) 

This  condition  simply  checks  that  we  are  only  performing  substitutions  on  variables 
that  are  bound  in  the  pattern. 

condition!^  =  3T0-  (So  *  cr(£))  A  n0) 

This  premise  separates  p  into  the  portion  that  satisfies  the  pattern,  cr(£),  and  the  rest, 
£0  and  n0.  In  our  example,  a  (->•  [next  :  b\*b*-A  [next  :  nil]  corresponds  to  cr(£). 

condition!^  3y.  An)) 
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This  premise  checks  that  the  symbolic  state  being  rewritten  satisfies  the  pattern  con¬ 
dition  II.  In  our  example,  II  is  true,  so  there  is  nothing  to  check  here.  The  predicates 
we  have  encountered  in  our  experiments  have  all  had  conditions  of  true.  However,  it  is 
easy  to  construct  examples  whose  abstraction  rules  require  this  check  to  be  performed.  An 
example  of  such  a  predicate  is  given  on  page  260. 

condition^  [if]  (E)  — pat->  (S')  [y]^j  e  A) 

This  condition  ensures  that  the  pattern  we  are  considering  is  one  of  the  provided  pat¬ 
terns.  There  may  be  multiple  applicable  patterns  at  any  single  point  during  the  abstraction 
process.  In  such  cases,  any  pattern  can  be  chosen  without  violating  soundness.  The  order 
in  which  patterns  are  applied  can  affect  the  performance  of  our  instrumentation  analysis. 
In  the  implementation,  we  adopt  the  heuristic  of  matching  “longest”  rules  first.  That  is,  we 
prefer  to  apply  patterns  where  the  left-hand  side  p  specifies  a  larger  formula,  where  length 
is  defined  as  the  number  of  spatial  predicates  appearing  in  p. 

Second  Rule 

The  second  rule  in  Figure  5.13  simply  discards  arithmetic  constraints  collected  during 
symbolic  execution  to  prevent  these  from  interfering  with  convergence.  An  abstract  do¬ 
main  for  integer  variables  could  also  be  used,  as  in  [Chang  and  Rival,  2008]. 

The  rules  in  Figure  5.13  can  be  automated  provided  that  the  existence  of  the  substitu¬ 
tion  o  in  the  first  rule  can  be  automatically  checked  for  each  element  of  A.  To  accomplish 
this,  we  guide  the  search  for  o  by  the  assumption  (p  =  3x0.  (E0  *  <t(E'))  A  n0.  Given  some 
symbolic  state  formula  pi  =  3x j .  E,  A  Hi,  we  search  Ei  for  some  collection  of  spatial 
predicates  matching  S',  modulo  some  unifying  substitution  o.  If  the  search  fails,  we  move 
on  to  the  next  element  of  A.  If  the  search  fails  for  all  elements  of  A.  then  we  conclude 
that  there  is  no  p',  c  related  to  pby  —  ABS — >. 

Soundness 

We  have  the  following  soundness  theorem  for  —  abs->. 

A 
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(V]  (s)  -p|^  (£')  e  A 

dom(cr)  =  x  p  =  3xq.  (So  *  <r(S))  A  IIo  P  =P  3y.  cr(n) 
(fv(cr(T,))  -  fv(a{ £')))  C  x0  (fv(cr(E))  -  fv(a{T,')))  n  fv{ S0)  =  0 

p  —  abs->  (3xq-  (Sq  *  <t(S/))  A  IIq  |  y  :=  ?;  assume(<r(II))) 


y  f-  fv{p>) 


P  A  (e\  <  e\)  -abs-^  (p  |  e) 

Figure  5.13:  Main  rewrite  rules  for  abstraction.  We  use  the  notation  x  :=  ?  to  indicate 

Tl  ‘  =  ?  •  •  O'*  •  =  *? 

•  •  f  •  •  •  r  ±_n  • 

Theorem  32.  If  all  patterns  in  A  are  sound,  and  V  h  {pf\  k  ►iyar  kfor  some  T,  k,  k,  and 
Pi  — (^2  |  c),  then  V  h  {<^1}  (c  ,  k)  ►iyar  k. 

Proof.  The  proof  follows  fairly  directly  from  Definition  35  and  the  rules  for  instrumenta¬ 
tion  given  in  Figure  4.1.  The  case  for  the  second  rule  is  immediate  as  p  A  (e\  <  ef)  =>  p 
and  so  the  conclusion  follows  from  Strengthening. 

Turning  to  the  first  rule,  our  goal  is  to  show  the  following. 

T  h  {p}  y  ?;  assume(cr(n)) ;  k  ►iyar  k 

We  will  work  backward  from  this  to  our  assumption  that  T  h  {3x0.  (£o*cr(£))An0}  k  ►iyar  k. 
We  have  from  the  assumptions  of  this  rule  that  ip  =  3T0.  (£0  *  cr(£))  A  II0  and 
( p  =>-  3 y.  cr(n).  Together,  these  give  us  the  following. 

p  =>  (3x0.  (£0  *  <?(£))  A  no) A  ^y-  o-(n) 

Our  side-condition  that  y  f  fv(p)  and  the  fact  that  p  =  3x0.  (£0  *  <r(£))  A  n0  gives 
us  that  y  f  fv(3x0.  (£0  *  cr(£))  A  II0).  This  lets  us  move  the  existential  quantifier  to  the 
front  of  the  consequent,  obtaining 

p  =>■  3 y.  (3f0-  (£0  *  o-(£))  A  n0)  A  cr(IT) 
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Thus,  by  Strengthening,  if  we  can  show  the  following,  we  will  have  proved  this  case. 

T  b  {3 y.  (3T0.  (S0  *  cr(E))  A  n0)  A  a(n)}  y  :  =  ?;  assume(o-(n ));k  ►ryar  k 

By  Inst-Exists,  we  will  have  the  goal  if  we  can  show 

T  b  {(3T0-  (S0  *  c(S))  A  n0)  A  cr(n)}  assume(a(n)) ;  k  ►ryar  k 

And  again  working  backward  from  this  goal,  using  rule  Inst-Assume  this  time,  we  must 
show  that 

r  b  {(3£0-  (So  *  cr(s))  A  n0)  A  <r(n)}  k  ►  ryar  k 

We  can  weaken  the  precondition  by  dropping  cr(fl).  We  do  so,  applying  Strengthening 
to  reduce  our  goal  to 

r  b  {3f0.  (S0  *  cr(S))  A  n0}  k  ►iyar  k 

This  is  one  of  our  assumptions,  so  the  case  is  proved.  □ 

abstract 

The  code  for  our  function  abstract  is  given  on  page  275.  We  use  a  comma  for  concate¬ 
nation,  so  the  operation  c,  c!  gives  the  concatenation  of  c  and  c'.  We  will  show  that  this 
function  satisfies  the  specification  given  in  Figure  5.7. 

The  invariant  for  the  loop  is  the  following. 

Invariant 

T  b  {93}  k  ►iVar  k  implies  T  b  {930}  (c  9  k)  ►ryar  k 

Initially  Holds  First  we  show  that  this  is  satisfied  initially,  abstract  (</30)  sets  y)  equal 
to  <£>0  and  c  equal  to  e.  Thus,  we  must  show  that 

r  b  {930}  k  ►ryar  k  implies  V  b  {( p0 }  (e  ,  k)  ►ryar  k 

Since  6,  k  =  k,  this  is  immediate. 
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Inductively  Holds  Next,  we  assume  that  we  have  the  loop  invariant  at  the  current  values 
of  if  and  c,  which  we  will  refer  to  as  and  Ci. 

r  b  {fi}  k  ►rvar  k  implies  T  b  {<^0}  (ci ,  k)  ►ryar  k 

We  also  assume  that  we  have 

<fl  — a^s->-  (0  \  c ') 

Now,  to  show  that  one  execution  of  the  loop  preserves  this  invariant,  we  assume  we 
have  executed  <p  :=  (pl  and  c  :=  c, ,  c'.  We  then  show  that  the  loop  invariant  is  re¬ 
established.  That  is,  the  following  holds. 

r  b  {f'}  k  ►ivar  k  implies  T  b  {930}  (ci,  c')  9  k  ►  ryar  k 

We  first  assume  T  b  {f'}  k  ►ryar  k.  By  Theorem  32  we  then  have  T  b  (c  ',/c)  ►ryar  k. 
The  loop  invariant  from  previous  iterations  then  gives  us  T  b  {930}  9  (c' ,  k )  ►rvar  k. 

Since  (cr ,  c')  9 k  =  c,  9  (c'  9  k)  we  have  now  established  the  conclusion  of  the  loop  invariant 
for  this  iteration. 


Implies  Specification  Finally  we  show  that  the  loop  invariant  implies  the  specification. 
The  invariant  is 

T  b  {93}  k  ►ivar  k  implies  T  b  {930}  (c  9  k)  ►rvar  k 


and  the  specification  requires  that  if  abstractor,)  returns  (93,  c)  then  the  following 
holds 


T  b  {93}  k  ►iVar  k  implies  T  b  {930}  (c  5  k)  ►ryar  k 


As  the  two  implications  are  the  same,  the  proof  is  complete. 


5.7.4  Additional  Comments 

There  is  much  more  that  can  be  said  about  abstraction.  For  some  starting  points  in  the 
context  of  shape  analysis  with  separation  logic,  see  [Yang  et  al.,  2008,  Chang  et  al.,  2007, 
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Function  abstract  ( <p0 )  .  Returns  a  weaker  symbolic  state  ip'  along  with  a  list  of 
instrumentation  commands  associated  with  the  transition  from  ip  to  ip'.  The  operation 
c,  c'  gives  the  concatenation  of  c  and  c'. 

V  ■=  <A) 
c  :=  e 

while  3ip',  c'.  ip  — abs->  (ip'  |  c ')  do 

p>  ■=  ip' 

c  :=  c,  c! 

end 

return  (<p,c) 


Chang  and  Rival,  2008].  Each  of  these  presents  a  different  take  on  what  criteria  to  use 
when  deciding  whether  or  not  to  weaken  a  formula  and  by  how  much.  In  particular,  [Yang 
et  al.,  2008]  notes  the  importance  of  keeping  track  of  whether  predicate  instances  are 
known  to  represent  non-empty  data  structures.  Depending  on  other  details  of  the  language 
of  symbolic  state  formulae,  this  information  can  be  necessary  to  prove  certain  examples. 

Non-emptiness  information  is  not  preserved  by  the  abstraction  patterns  presented  in 
the  previous  section,  though  our  implementation  does  have  a  command  line  parameter  to 
toggle  tracking  of  non-emptiness  information.  In  the  treatment  of  abstraction  just  pre¬ 
sented,  we  chose  to  concentrate  on  the  core  idea  of  abstraction,  which  is  the  use  of  the 
spatial  portion  of  the  heap  to  guide  the  selection  and  application  of  abstraction  rules.  The 
rules  themselves  can  be  made  to  keep  more  or  less  information,  and  the  conditions  that 
trigger  them  can  be  adjusted,  but  the  basic  matching  strategy  is  the  same  in  all  current 
systems  of  which  the  authors  are  aware. 
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5.8  Example  (continued) 

Now  that  we  have  a  definition  for  abstract,  we  return  to  our  list  traversal  example, 
reproduced  below. 

Ll  :  (T)  branch  x  ^  nil  =>■  (2)  x  :=  x. next;  (3)  goto  Lu 

x  =  nil  =>■  (4)  halt  end 

We  had  previously  obtained  the  following  formula  just  prior  to  evaluating  the  goto  L\ 
statement  which  triggered  a  call  to  abstract. 

3x'.  {x  i  ^  [next  :  x\  *  ls(n0]x,  nil)) 

We  will  now  execute  our  new  definition  of  abstract  with  the  following  abstraction 
patterns.  These  are  the  actual  patterns  used  by  our  tool  for  singly-linked  lists. 

[x,  y,  z,  n0]  (x  i  y  [next  :  y]  *  ls(n0 ;  y,  z))  Vat— >  (fa(n;  x,  z))  [n]  (5.15) 

n=nQ+l 

[x,  y,  z,  n0]  (/s(n0;  x,  y)  *  y  H-  [next  :  z})  -pat%  (fa(n;  x,  z))  [n]  (5.16) 

n=n0+l 

[x,y,z,Ri,R2]  (ls(rii,x,y)  *  ls(n2-,y,z))  -pat%  (ls(;n-,x,  z))  [n]  (5.17) 

n=n1-\-n2 

[x,z]  (x  (->•  [next  :  z\)  — pat->  (ls(n-,x,  z))  [n]  (5.18) 

We  can  abstract  3x'.  (x1  i-a  [next  :  x]  *  ls(n0 ;  x,  nil))  by  applying  (5.18)  to  obtain 

3x'.  Isiny,  x',  x)  *  ls(n0 ;  x,  nil)  (5.19) 

along  with  the  instrumentation  commands  n,  :=  ?;  assume(n1  =  1).  This  formula  will  be 
an  invariant  at  Lls  as  we  can  see  by  executing  geninstCont  starting  from  this  state.  If 
we  do  this,  the  formula  we  obtain  at  location  (3),  just  before  goto  Lu  is 

3x',X2-  ls(n1]x',X2 )  *  (x2  ^  [next  :  a:])  *  ls(n2;x,  nil) 

along  with  the  instrumentation  command  n2  :=  ?;  assume(n0  =  n2  + 1).  Now  we  can  ex¬ 
ecute  implies  to  verify  that  this  formula  in  fact  implies  the  invariant  (5.19).  implies 
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first  calls  abstract,  obtaining  instrumentation  commands  n3  :=  ?;  assume(n3  =  n+  1) 
and  state  formula 

3x'.  ls{n3 ;  x\  x )  *  ls(n2 ;  x,  nil) 

Next  we  search  for  a  frame  inference  proof,  using  InstL  to  match  n2  to  n0  and  n3  to 
nl.  This  results  in  instrumentation  commands  n0  :=  n2;  nl  :=  n3.  Note  that  implies 
calls  abstract  before  performing  the  frame  inference  proof.  This  compensates  for  the 
fact  that  the  frame  inference  system  does  not  contain  a  rule  to  expand  inductive  predicate 
instances  on  the  right  (and  not  having  such  a  rule  in  frame  inference  is  useful  as  this 
reduces  the  proof  space  that  must  be  searched). 

Combining  all  this,  the  entire  process  results  in  the  instrumented  continuation  in  Figure 
5.14.  Note  that  since  there  are  two  symbolic  state  formulae  associated  with  Li  in  the  final 
version  of  T  (the  initial  state  and  the  discovered  invariant)  we  have  a  non-deterministic 
choice  between  the  instrumentations  corresponding  to  each  element  of  T(Li). 

There  are  a  number  of  simplifications  that  can  be  made  to  this  program  while  retaining 
the  same  semantics.  For  example,  the  sequence  of  commands  n1  :=  ?;  assume(n1  =  1) 
is  equivalent  to  n1  :=  1.  We  proved  this  in  Section  4.1.3  in  the  context  of  the  derivability 
of  the  Inst-Assign  rule.  Similarly,  n3  :  =  ?;  assume(n3  =  nx  +  1)  is  equivalent  to 
n3  Tki  + 1-  Noting  that  assume(n  =  n0  + 1)  is  equivalent  to  assume(n0  —  n—  1)  allows 
us  to  also  rewrite  n0  :=  ?;  assume(n  =  n0  +  1)  to  the  command  n0  :=  n  —  1. 

We  can  also  eliminate  intermediate  writes.  The  sequence  n3  rp  +  1;  . . .  ;nl  :  =  n3 ; 
can  be  reduced  to  n1  :=  n1  + 1  in  cases  where  n3  is  not  read  or  written  by  other  commands. 
Simplification  based  on  these  equivalences  is  implemented  in  our  tool  for  list-based  data 
structures.  This  results  in  a  quite  dramatic  reduction  in  the  size  of  the  instrumented  pro¬ 
gram.  The  simplified  program  for  this  example  is  given  in  Figure  5.15. 

Such  simplifications  are  possible  because  the  instrumentation  commands  for  lists  are 
deterministic.  For  data  structures  like  trees,  where  an  instrumentation  based  on  tracking 
the  size  of  the  tree  is  inherently  non-deterministic,  such  translations  of  assume  statements 
to  assignments  no  longer  apply.  That  is  not  to  say,  however,  that  there  are  is  no  hope  of 
simplifying  more  complex  examples.  Even  though  the  non-determinism  is  an  important 
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L i  :  branch 

true  => 
branch 

x  /  nil  =>•  assume(true); 
branch 

n  =  0  =4>  assume(true);  assume(false) ;  halt, 

n  >  0  =>- 

n0  :=  ?;  assume(n  =  n0  +  1) ;  x  :  =  x.next; 
n1  :=  ?;  assume^  =  1);  goto  L\ 

end, 

x  =  nil  =>  assume(true) ;  halt 
end 
true  =>• 
branch 

x  /  nil  assume(true) ; 
branch 

n0  =  0  =>  assume(true) ;  assume(false) ;  halt, 

n0  >  0  =>• 

n2  :=  ?;  assume(n0  =  n2  +  1);  x  :=  x.next; 

n3  :=  ?;  assume(n3  =  Zh  +  1);  n0  :=  n2;  n3  :=  n3; 

goto  L\ 
end 

x  =  nil  =>  assume(true) ;  assume(false)  halt 
end 
end 


r(Li)  ={  ls(n;x,  nil), 

3x'.  (ls(ni,  x',  x)  *  ls(n0;  x,  nil))  } 

Figure  5.14:  The  full  instrumentation  of  the  singly-linked  list  example. 
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L\  :  branch 

true  =>- 
branch 

x  /  nil  =>•  assume(true); 
branch 

n  =  0  =4>  assume(true) ;  assume(false) ;  halt, 

n  >  0  =>• 

n0  :=  n—  1;  x  :=  x.next; 

:=  1;  goto  Li 

end, 

x  =  nil  =>•  assume(true) ;  halt 
end 
true  =>• 
branch 

x  /  nil  =>•  assume(true) ; 
branch 

n0  =  0  =>  assume(true) ;  assume(false);  halt, 

n0  >  0  =>• 

n0  :=  n0  —  1;  x  :=  x.next; 

Hi  =  Zti  + 1 ;  goto  Li 
end 

x  =  nil  =>  assume(true) ;  assume(false)  halt 
end 
end 


r(Li)  ={  /s(n;x,  nil), 

Ex' .  ( ls(npx x)  *  fe(n0;  x,  nil))  } 

Figure  5.15:  A  simplified  version  of  the  instrumentation  given  in  Figure  5.14. 

part  of  the  instrumentation  for  branching  data  structures,  the  approach  presented  in  this 
section  still  produces  unnecessary  intermediate  variables.  When  passing  our  numeric  pro¬ 
grams  to  external  tools,  the  number  of  variables  is  often  an  important  quantity  that  we 
would  like  to  minimize.  Finding  methods  of  eliminating  these  unnecessary  intermediate 
variables  in  the  general  case  is  ongoing  work. 
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5.9  Tracking  Flow  of  Control 

Note  that  the  instrumented  program  produced  for  our  example  contains  some  paths  that 
we  know  to  be  infeasible.  For  example,  it  should  not  be  possible  to  start  at  the  initial  state 
and  immediately  execute  the  second  case  of  the  main  branch.  This  case  was  generated 
from  the  precondition 

3x'.  ( ls(ni,  x' ,  x )  *  ls(n0 ;  x,  nil)) 

but  this  formula  does  not  hold  in  the  initial  state  of  ls(n ;  x,  nil)  (the  variables  n0  and  nx 
have  not  yet  been  assigned  values).  We  can  rule  out  such  spurious  paths  in  the  following 
way.  We  number  each  element  of  T(Li)  and  add  an  instrumentation  variable  that  tracks 
which  precondition  was  supplied  for  the  current  execution  of  the  code  at  Lx.  This  counter 
is  initially  set  to  the  value  corresponding  to  the  initial  state.  If  we  make  this  change,  giving 
the  initial  state  number  1  and  the  invariant  number  2,  and  using  p  to  track  the  precondition 
from  which  we  are  executing,  we  obtain  the  code  in  Figure  5.16.  Control  now  begins  at 
Lq  so  that  p  can  be  assigned  the  correct  value. 

We  can  apply  this  control-flow-tracking  transformation  to  the  general  case.  Cur¬ 
rently,  when  we  emit  the  final  instrumented  continuation  in  instrument,  we  iter¬ 
ate  over  each  continuation  in  the  original  program,  emitting  a  branch  of  the  form 
branch  true  k\, ,  true  =>■  kn  end  where  ki, . . . ,  kn  are  instrumentations  of  the  orig¬ 
inal  continuation  starting  from  different  preconditions.  If  we  number  the  preconditions 
from  1  to  n,  we  can  track  viable  paths  more  precisely  by  emit  a  branch  of  the  form 

branch  (p  —  1)  =>-  ki, . . . ,  (p  —  n)  =>-  kn  end 

Then,  in  geninst  Cont,  when  we  process  a  goto  /  command  and  discover  that  the  current 
state  implies  the  ith  element  in  the  set  F  ( / ) ,  we  emit  the  instrumentation  command  p  =  i 
just  prior  to  the  goto  /  statement. 

This  records  in  the  code  more  information  about  feasible  paths.  However,  not  all  ex¬ 
ternal  tools  will  make  use  of  this  information.  It  is  common  for  program  analysis  tools  to 
handle  control  flow  and  data  differently.  Thus,  our  trick  of  encoding  control  flow  informa¬ 
tion  in  an  extra  integer-valued  variable  may  not  work.  In  such  cases,  since  the  domain  of 
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L0  :  p:=  1;  goto  L\ 

L\  :  branch 

p  =  1  =>• 
branch 

x  /  nil  =>•  assume(true); 
branch 

n  =  0  =>-  assume(true);  assume(false) ;  halt, 

n  >  0  =>• 

n0  :=  n  —  1;  x  :  =  x.next;  nx  :=  1; 
p  :=  2;  goto  Li 

end, 

x  =  nil  =4>  assume(true) ;  halt 
end 

p  =  2  => 

branch 

x  /  nil  =4>  assume(true)  ; 
branch 

n0  =  0  assume(true) ;  assume(false) ;  halt, 

n0  >  0  =>• 

n0  :=  n0  —  1;  x  :=  x.next;  nj  :=  n1  +  1; 
P  :=  2;  goto  Lx 
end 

x  =  nil  =4>  assume(true) ;  halt 
end 
end 


r(L0)  ={  ls(n;  x,  nil)  } 

T(Li)  ={  ls(n;  x,  nil)  A  p  =  1, 

3x'.  (fe(nx;  x',  x)  *  fe(n0;  x,  nil))  A  p  =  2  } 

Figure  5.16:  An  instrumentation  of  the  singly-linked  list  example  that  tracks  flow  of  control  using 
a  variable  p. 
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our  p  variable  is  finite,  we  can  fully  unroll  the  program  with  respect  to  p,  as  is  commonly 
done  in  bounded  model  checking  [Biere  et  al.,  1999],  before  passing  it  to  the  analysis  tool. 


5.10  Translating  Branch  Conditions 

We  will  now  consider  what  happens  when  we  want  to  prove  a  property  of  our  example 
program.  Suppose  we  are  interested  in  showing  termination,  and  in  using  an  external  ter¬ 
mination  prover  to  do  the  termination  reasoning.  Then  we  first  convert  the  instrumented 
program  that  we  have  produced  to  a  numeric  program  using  the  projection  operation  de¬ 
fined  in  Section  4.4.  The  result  of  the  operation  is  given  in  Figure  5.17,  where  we  have 
projected  the  program  onto  the  set  of  instrumentation  variables  IVar.  The  result  is  that  the 
branch  conditions  involving  x  become  true  and  the  x  :  =  a;. next  commands  disappear. 

The  example  does  terminate  in  all  cases,  as  the  branch  that  executes  goto  Li  in  the 
p  =  2  case  is  guarded  by  n0  >  0.  This  condition  cannot  remain  true  forever  since  this 
branch  also  decreases  n0.  However,  there  are  important  properties  of  the  program  that  are 
not  captured  by  this  abstraction.  Specifically,  while  the  program  will  always  terminate,  it  is 
allowed  to  “terminate  early.”  The  instrumented  program  terminates  exactly  when  n0  =  0, 
however  the  numeric  abstraction  may  terminate  with  any  value  of  n0  (by  executing  the 
second  true  branch  in  the  p  =  2  case  of  Lx. 

As  with  our  discussion  of  flow  of  control  in  the  previous  section,  the  result  is  still 
sound,  but  the  program  contains  paths  that  are  known  to  be  spurious.  Thus  we  can  obtain 
a  more  precise  abstraction  if  we  can  rule  out  these  paths. 

Consider  the  program  below,  which  iterates  through  a  list  and  then  checks  that  x  =  nil 
following  the  traversal  (aborting  if  this  does  not  hold).  Triggering  the  abort  in  this  program 
is  not  possible. 

L\  :  (T)  branch  x  ^  nil  =>•  (2)  x  =  x.next;  (5)  goto  L\, 

x  =  nil  =>  (4)  goto  L2  end 
L2  :  (5)  branch  x  /  nil  =>•  (6)  abort, 

x  =  nil  =>•  (7)  halt  end 
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Lq  :  p:=  1 ;  goto  L\ 

Li  :  branch 

p  =  1  =>- 

branch 

true  =>■  assume(true)  ; 
branch 

n  =  0  =>•  assume(true) ;  assume(false) ;  halt, 

n  >  0  =>• 

n0  :=  n  —  1;  :=  1; 

P  ■=  2;  goto  Li 

end, 

true  =>■  assume(true) ;  halt 
end 

p  =  2  =>- 

branch 

true  =>  assume(true)  ; 
branch 

n0  =  0  =>  assume(true) ;  assume(false);  halt, 

n0  >  0  =>• 

n0  :=  Zto  —  1;  ’ll  :=  Hi  +  i; 

P  :=  2;  goto  Lx 
end 

true  =>■  assume(true) ;  halt 
end 
end 


Figure  5.17:  The  numeric  program  corresponding  to  the  program  in  Figure  5.16. 


A  simplified  version  of  a  numeric  program  for  this  code  is  given  below.  For  each 
branch  condition,  we  write  in  square  brackets  the  original  program  branch  condition, 
if  any,  associated  with  that  branch.  We  have  eliminated  the  branches  of  the  form 
n  —  0  assume(true) ;  assume(false)  since  the  assume(false)  ensures  that  there  are 
no  executions  along  this  branch.  We  then  replaced  the  single  remaining  “n  >  0  =>■  . . .” 
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branch  with  “assume(n  >  0) ;  . . .,”  which  is  equivalent. 


L0  :  p:=  1;  goto  L\ 

L\  :  branch 

p  =  1  =>• 

branch 

true  [x  /  nil]  =>•  assume(true) ;  assume(n  >  0); 
n0  :  =  n  —  1 ;  :=  1; 

p:=2;  goto  Li 

true  [x  =  nil]  =>  assume(true) ;  goto  L2 
end 

p  =  2  =>• 

branch 

true  [x  /  nil]  =>  assume(true) ;  assume(n0  >  0); 

n0  :=  Zto  “  1;  til  :=  Hi  +  i; 

p  :=  2;  goto  Li 

true  [x  =  nil]  =>  assume(true) ;  goto  L-2 
end 
end 

L2  :  branch 

true  [x  7!  nil]  abort 
true  [x  =  nil]  =>•  halt 
end 

There  are  two  types  of  assume  commands  that  have  been  inserted  here.  The 
assume(n  >  0)  and  assume(n0  >  0)  commands  came  from  expanding  the  list  segment 
predicate  in  order  to  prove  that  x  is  in  the  heap  for  the  processing  of  the  x  :=  x.next 
command.  The  assume(true)  statements  come  from  the  call  to  branchAnnot  in 
geninstCont.  Because  the  DefL  rule  in  frame  inference  is  the  only  operation  that 
inserts  instrumentation  branches  into  the  code,  we  will  only  record  information  about  n 
and  n0  when  we  are  forced  to  expand  an  inductive  predicate.  Branches  such  as  those  as¬ 
sociated  with  the  x  ^  nil  conditions  in  L\  and  L2,  which  do  not  access  the  heap  following 
the  branch,  do  not  result  in  information  about  n  and  n2  being  recorded. 

What  we  would  like  to  do  is  incorporate  into  the  automated  analysis  some  version  of 
the  Inst-BranchTrans  derived  rule  from  Section  4.1.3.  To  do  so,  we  need  some  method 
of  finding  pure  formulae  implied  by  the  current  symbolic  heap.  One  approach  is  suggested 


284 


5.10  Translating  Branch  Conditions 


by  our  DefL  rule  and  the  fact  that  branches  that  make  use  of  DefL  already  end  up  record¬ 
ing  some  information  about  the  instrumentation  variables.  This  occurs  because  DefL  case 
splits  on  the  conditions  associated  with  an  inductive  predicate  and  then  LeftPureFalse 
effectively  prunes  any  impossible  branches,  thus  recording  in  the  code  which  values  of  the 
instrumentation  variables  are  consistent  with  the  current  symbolic  state. 

One  approach  to  recording  more  information  at  branch  points  is  to  have  br  anchAnnot 
eagerly  try  to  expand  all  inductive  predicates  in  the  current  symbolic  state  in  order  to  test 
which  expansions  are  consistent.  This  can  be  accomplished  fairly  easily  and  generally  by 
augmenting  our  system  for  frame  inference.  We  add  support  for  pure  abduction ,  which 
is  similar  to  the  abductive  inference  of  spatial  predicates  discussed  in  [Calcagno  et  al., 
2009]  but  discovers  pure  rather  than  spatial  assumptions.  The  pure  abduction  problem  is 
to  produce  from  p  and  p'  a  pure  formula  II  such  that  p  A  II  p' .  To  accomplish  this  we 
modify  the  form  of  our  sequents  to  the  following. 

na  +  £a  0  v  =^fk  p'  /  r  h  fc 

We  have  added  a  component  IIa  to  the  left,  which  is  the  pure  hypothesis  necessary  to 
guarantee  the  conclusion.  IIa  is  considered  an  output  in  the  algorithmic  interpretation  of 
our  inference  system.  A  derivation  of  the  new  sequent  form  above  guarantees  that  the 
following  is  derivable  in  the  old  system. 


sa  D  <p  a  nQ 


>k<p  7/rhfc 


For  all  rules  except  DefL,  IIa  is  simply  passed  unchanged  from  the  hypothesis  to  the 
conclusion.  So,  for  example,  PtoMatches  becomes 


PtoMatches 

na  +  £a  *  (e  KA  p)  []  p  =>fk  p'  II  r  h  k 
na  +  £a  I  (e  ha  p)  *  p  ==>fk  p'  *  (e  ha  p)  //  T  h  k 
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The  axioms  set  IIa  to  true,  since  when  they  hold  no  additional  assumptions  are  neces¬ 
sary. 

RightPure 

n  =>  3x. .  IT  fk{ (Ea  *  £)  A  IT)  =  Some (T,  k) 

true  +  Ea  [  S  A  II  =^/fc  emp  A  IT  /  T  h  k 

s 

The  DefL  rule  then  becomes  the  following  which,  rather  than  requiring  all  cases  to  be 
provable,  instead  checks  that  the  conclusion  is  provable  for  some  subset  of  the  cases.  It 
then  includes  the  negation  of  all  the  cases  which  are  not  provable  in  the  constraint  IIa  that 
is  returned.  The  idea  is  that,  if  these  negations  had  been  provided  as  assumptions,  then  all 
the  non-provable  cases  would  have  followed  from  LeftPureFalse  due  to  the  conditions 
for  those  cases  being  inconsistent  with  these  assumptions.  We  will  present  an  example 
shortly. 


We  write  I  to  represent  a  set  of  integers  and  write  branch  to  represent  the  branch  with 
one  case  for  each  element  i  of  /  (just  as  |Jve/  represents  the  union  with  one  component  for 
each  i  e  I).  As  is  standard,  the  empty  iterated  conjunction  is  equal  to  true.  We  write  ->I 
for  the  complement  of  I.  This  is  all  cases  that  are  not  in  I.  So  if  the  cases  are  {1 . . .  n} 
and  /  C  {1 . . .  n}  (as  the  rule  requires),  then  ->I  is  {1 . . .  n}  —  I. 

DefL 


(d(v)  <=>  Cfv)  |  ...  |  Cn(v))  G  S 
Ci(e j  =  (ilj  :  let  zt  satisfy  IT'  in  pfj  I  C  {1, . . . ,  n} 

\/i  e  i.  (na;  +  £a  []  (p  *  Pi)  a  ii;  a  n'  =>/fc  p’  /  r*  h  %) 

S 

A(n.  =>  n..,)  a  A  An,)  +  Ea  [  ^  *  i(e]  /// 


Vi.  Zi  SojIIj) 


((Jigj  Tj)  h  branch  . . . ,  Ilj  =>•  :=  ?;  assume(II') ;  ktl . . .  end 

'  ~ 1  i£l 


The  assumptions  IIa  that  build  up  can  be  simplified  using  rules  of  Boolean  logic,  as  we 
show  later  in  an  example. 

The  soundness  result  then  becomes  the  following. 

Theorem  33.  If  Ua  +  p  fk  p1  //  T  \~  k  then 

s 

r  v  {p  a  nj  k  ►rvax  k 
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Soundness  of  the  augmented  proof  system  is  straightforward.  For  most  rules  it  follows 
directly  from  the  induction  hypothesis,  since  IIa  is  not  changed  from  premise  to  conclu¬ 
sion.  For  the  axioms,  the  same  proof  can  be  reused  since  p  A  true  AA  p.  For  DefL,  the 
reasoning  is  similar  to  that  for  the  original  rule  in  terms  of  reducing  cases  to  instances  of 
the  inductive  hypothesis.  The  main  addition  is  that  we  must  show  that  the  omitted  cases 
have  proofs  if  we  assume  IIa.  But  IIa  contains  the  negation  of  the  case  conditions  for  all 
omitted  cases,  so  p  A  IIa  implies  false  in  every  omitted  case,  allowing  us  to  prove  each  of 
these  cases  with  LeftPureFalse. 

We  can  now  give  a  definition  of  branchAnnot  that  uses  this  augmented  frame  infer¬ 
ence  procedure  to  introduce  assumptions  on  instrumentation  variables  at  every  branch  case 
present  in  the  original  program.  The  code  for  the  function  is  listed  on  this  page.  Given 
the  current  symbolic  state  formula  p,  the  function  tries  to  prove  for  each  branch  condition 
e,  that  p  A  et  =>-  false.  It  does  this  by  making  a  call  into  frame  inference.  If  the  proof 
search  succeeds,  then  IIa  will  contain  the  conditions  under  which  this  implication  holds. 
This  makes  Fla  an  under-approximation  of  the  negation  of  the  branch  condition.  To  obtain 
an  over- approximation  of  the  branch  condition,  we  simply  negate  IIa. 


Function  branchannot  {p,  ei,  e2, . . . ,  en) .  Function  for  annotating  original 
branches  with  pure  formulae  over  the  instrumentation  variables  that  are  guaranteed 
to  hold  by  each  original  branch,  p  is  the  current  symbolic  state  and  ei, . . . ,  en  are  the 
conditions  to  be  translated. _ 

fun  f(p)  = 
return  Some(0,  halt) 

in 

foreach  e*  do 

if  IIa  +  p  A  e*  =>- fk  emp  A  false  /  T  F  k  then 
e'i  :=  ->lla 

end 

end 

return  (ei, . . . ,  e'J 
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We  will  now  show  an  example  demonstrating  the  use  of  our  augmented  version  of 
frame  inference  to  infer  conditions  on  instrumentation  variables.  Suppose  we  have  the 
following  state,  using  the  ls(n;  x,  y)  predicate  from  earlier. 

x ,  y)  *  ls(n2 >  Vi x ))  A  +  n2  >  0 

This  indicates  that  the  heap  consists  of  a  non-empty  cyclic  list  with  x  and  y  pointing  into 
it.  We  will  translate  the  branch  condition  x  ^  y  into  a  condition  on  nx  and  n2.  We  give  the 
proof  tree  in  Figure  5.18,  following  the  syntax  from  section  5.6,  where  we  annotate  each 
node  in  the  tree  with  the  name  of  the  rule  that  is  applied  and  list  any  parameters  that  must 
be  chosen  next  to  the  name.  Below  the  name  of  the  rule,  we  write  the  output.  Since  we  are 
only  interested  in  the  set  of  assumptions  that  are  returned,  we  only  list  IIa  and  omit  Y  and 
k.  We  write  not  provable  for  the  cases  for  which  no  proof  can  be  found. 

The  derivation  below  the  root  of  the  tree  in  the  figure  demonstrates  how  the  condition 
that  is  returned  can  be  simplified  to  -1(7^  >  0)  V  -i(n2  >  0).  This  then  gets  negated  and 
used  as  the  assumption  for  this  case.  Thus,  we  have  discovered  that  in  the  state 

(Is (up,  x,  y)  *  ls(n2 ;  y,  x))  A  n1  +  n2  >  0 

if  x  ^  y  then  it  is  also  the  case  that  nx  >  0  A  n2  >  0.  We  can  perform  a  similar  analysis 
working  from  the  condition  x  =  y.  We  will  get  a  proof  tree  like  that  in  Figure  5.18,  but  the 
not  provable  and  LeftPureFalse  cases  will  be  flipped.  The  condition  returned  will  sim¬ 
plify  to  (n,  ^  0)  A  (n2  ^  0)  resulting  in  an  assumption  for  the  case  of  (nx  =  0)  V  (n2  =  0), 
exactly  the  conditions  under  which  the  state  allows  us  to  conclude  x  =  y  (although  the 
result  is  not  always  exact;  in  general  it  is  an  over- approximation  of  the  condition  we  are 
analyzing). 

We  now  return  to  our  list  traversal  example  from  page  284,  in  order  to  insert  branch 
assumptions  and  obtain  an  abstraction  that  is  more  precise.  Figure  5.19  gives  the  result. 
In  the  p  =  1  case,  the  condition  that  we  obtain  for  x  ^  nil  is  n  >  0  and  for  a;  =  nil 
we  obtain  n  —  0.  For  p  =  2  the  conditions  are  n0  >  0  and  n(j  =  0.  We  have  also 
expanded  the  continuation  at  L2  to  account  for  the  fact  that  it  is  executed  from  two  different 
preconditions. 
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LeftPureFalse  LeftPureFalse  LeftPureFalse 

true  true  true 


ExistsL  [a/z\ 
true 


ExistsL  [b/z\ 
true 


n2  =  0  \  /  n2>0 

DefL  ( ls(n2;y,x )) 

((n2  =  0)  =>•  true)  A  (n2  >  0  =>-  true) 


ExistsL  [c/z] 


«o?  provable 


H2  =  0  \  /  >  0 

DefL  (ls(n2:y,x)) 

((ra2  =  0)  =>•  true)  A  -i(n2  >  0) 


Hi  =  //  Zti  >  0 

DefL  {ls(npi  x,  y )) 

((nx  =  0)  =>  ((n2  =  0)  =>•  true)  A  (n2  >  0  =>-  true)  A 
(iti  >  0)  =>  ((n2  =  0)  =$■  true)  A  -i(n2  >  0)) 
true  A  (n,  >  0  =>■  -> (n2  >  0)) 

•vv-  ^(rij  >  0)  V  - '(n2  >  0) 


Derivation  of 

na  +  (ls{np,x,  y)  =i=  fe(n2;  y ,  x))  A  +  n9  >  0  A  x  /  y  =>fk  emp  A  false  //  T  \~  k 

Ligure  5.18:  Proof  for  the  given  frame  inference  query.  Below  each  rule  name  we  show  the  value 
that  na  has  in  the  conclusion  of  that  rule. 

It  is  now  clear  due  to  the  additional  assume  statements  that  goto  L-2  can  only  be  exe¬ 
cuted  in  the  p  —  l  case  if  n  =  0.  The  assume(n  >  0)  that  guards  the  abort  command  in 
L2  then  ensures  that  abort  will  not  be  reached  in  any  execution.  A  similar  situation  holds 
with  n0  for  p  =  2. 

In  this  example,  unreachability  of  abort  could  have  been  proved  with  pure  heap  rea¬ 
soning  (integer  values  are  not  required).  However,  for  more  complicated  properties,  such 
as  computing  upper  bounds  on  variables,  and  for  more  complex  examples  with  multiple 
integer  quantities  involved,  it  can  be  useful  to  have  a  more  accurate  numeric  abstraction. 
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L0  :  P-=  1;  goto  Li 

Li  :  branch 

p  =  1 

branch 

true  [x  /  nil]  =>•  assume(n  >  0);  assume(n  >  0); 

zto :=  zt —  i;  zh  :=  l; 

P  :=  2;  goto  Li 

true  [x  =  nil]  assume(n  =  0) ;  p  :=  1 ;  goto  L2 
end 

p  =  2  =>- 

branch 

true  [x  /  nil]  =>  assume(n0  >  0);  assume(n0  >  0); 
Ho  '■=  Ho  —  1;  Zh  :=  Zh  +  1; 
p  :=  2;  goto  Lx 

true  [x  =  nil]  =>  assume(n0  =  0);  p  :=  2;  goto  L2 
end 
end 

L2  :  branch 

p  =  1  => 

branch 

true  [x  /  nil]  assume(n  >  0);  abort 

true  [x  =  nil]  assume(n  =  0) ;  halt 

end 

p  =  2  => 

branch 

true  [x  /  nil]  assume(n0  >  0);  abort 

true  [x  =  nil]  =>  assume(n0  =  0);  halt 

end 
end 


r(Li)  ={  ls(n;  x,  nil)  A  p  =  1, 

3x'.  (ls(n1;x>,  x)  *  ls(n0 ;  x,  nil))  A  p  =  2  } 
r(L2)  ={  fc(n;  x,  nil)  A  p  =  1, 

3x'.  (ls(ni,  x\  x)  *  /s(n0;  x,  nil))  A  p  =  2  } 

Figure  5.19:  The  numeric  program  corresponding  to  the  program  from  page  284  after  perform 
branch  condition  annotation.  The  original  branch  conditions  are  given  in  square  brackets. 
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5.11  Experimental  Results 

We  have  implemented  the  techniques  described  here  in  the  tool  Thor  [Magill  et  al.,  2008]. 
The  program  takes  as  input  a  file  containing  specifications  of  inductive  predicates  and  a 
C  language  source  file.  The  source  file  can  optionally  be  annotated  with  function  pre- 
and  post-conditions.  If  pre-  and  post-conditions  are  not  provided,  they  are  inferred  by  the 
analysis  (with  the  assumption  that  the  heap  is  empty  at  the  beginning  of  execution).  The 
program  is  analyzed  using  the  data  structure  specification  provided  and  a  numeric  program 
is  generated  which  can  be  passed  to  an  external  tool  for  further  analysis.  The  numeric  pro¬ 
gram  can  be  generated  in  several  formats,  matching  the  input  languages  of  various  analysis 
tools.  The  most  useful  output  format  is  C  language  source  code,  as  many  verification  tools 
can  accept  C  language  source  either  directly  or  after  some  simple  translation. 

Thor  is  written  in  Ocaml  and  uses  Yices  [Dutertre  and  Moura,  2006]  as  the  external 
theorem  prover  for  discharging  pure  entailments.  It  uses  the  CIL  [Necula  et  al.,  2002] 
program  analysis  framework  to  handle  parsing  of  the  C  code  and  to  convert  the  input 
to  a  more  regular  form  (e.g.  eliminating  switch  statements  by  encoding  them  using  if 
statements  and  gotos). 


5.11.1  Simple  Examples 

Table  5.2  summarizes  the  experimental  results  of  verifying  safety  and  termination  of  some 
programs  that  manipulate  different  inductive  data  structures.  For  each  program,  we  use 
Thor  to  produce  the  numeric  abstraction  of  the  original  program.  Then  we  use  Blast 
[Henzinger  et  al.,  2002]  and  ARMC  [Podelski  and  Rybalchenko,  2007]  to  verify  assertion 
safety  and  ARMC-Live  to  check  termination  of  the  numeric  abstraction.  The  results  of 
Blast,  ARMC,  and  ARMC-Live  are  all  consistent  with  the  expected  results  and  thus 
we  only  list  the  timing  information. 

Most  of  the  programs  are  common  data  structure  manipulations  that  involve  looping, 
e.g.  to  insert  an  element  into  a  binary  search  tree.  In  such  cases  termination  is  the  main 
property  of  interest.  The  first  two  doubly-linked  list  examples  require  the  proving  of  in- 
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Safety  Termination 


Program 

Expected  Result 

2na 

Tblast 

Tarmc 

Tarmc-live 

Doubly  Linked  Lists 

copy_zip 

safe  /  terminates 

4.862 

0.238 

7.674 

31.683 

iter_sum 

safe  /  terminates 

1.204 

0.342 

8.036 

9.589 

Circular  Doubly-Linked  Lists 

traverse 

safe  /  terminates 

1.526 

0.046 

0.908 

1.383 

delete 

safe  /  terminates 

2.245 

0.068 

11.138 

20.204 

meet 

safe  /  diverges 

0.760 

0.126 

1.734 

0.180 

Circular  Linked  Lists 

sum 

safe  /  terminates 

0.827 

0.065 

1.621 

2.582 

add_after 

safe  /  terminates 

1.072 

0.061 

4.846 

12.342 

add_after_loop 

safe  /  diverges 

0.997 

0.065 

1.945 

3.364 

Skip  Lists 

create 

safe  /  terminates 

9.651 

0.122 

10.546 

34.960 

lift 

unsafe  /  diverges 

10.464 

0.356 

5.814 

971.090 

(indJoop 

safe  /  diverges 

4.431 

0.106 

36.860 

45.709 

Binary  Search  Trees 

insert 

safe  /  terminates 

1.550 

0.046 

0.458 

0.895 

mem 

safe  /  terminates 

0.573 

0.042 

0.387 

2.690 

Table  5.2:  Experimental  results.  Time  is  in  seconds.  Tna  represents  the  time  required  to  produce 
the  numeric  abstraction.  7~bLast>  T\rmc,  and  Tarmc-live  represent  the  time  taken  to  verify  the 
numeric  abstraction  by  Blast,  ARMC,  and  ARMC-Live  respectively. 
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teger  properties  in  order  to  guarantee  memory  safety.  For  example  copy  jzip  defines  a  zip 
routine  that  takes  in  two  lists  and  returns  a  list  of  pairs.  The  routine  assumes  that  both  lists 
have  the  same  length  and  is  only  memory  safe  if  this  holds.  The  main  function  then  calls 
zip  with  a  list  plus  the  result  of  a  list  copy  operation. 

Attempting  to  construct  a  standard  memory  safety  proof  for  such  a  program  fails,  as 
we  cannot  show  that  certain  memory  accesses  do  not  involve  dereferencing  nil.  To  fix  this, 
we  can  take  each  command  A[x\  that  requires  a  heap  cell  to  exist  at  x  and  replace  it  with 
“if  x  ^  nil  then  A[x\  else  abort.”  This  yields  a  program  where  the  assumption  that  x  ^  nil 
is  available  to  us  when  we  execute  the  command  A[x],  but  we  are  left  with  potential  aborts 
in  the  code.  If  we  can  then  show  that  abort  is  unreachable,  by  running  a  safety  checker  on 
the  numeric  program  we  generate,  then  we  will  have  shown  memory  safety  of  the  original 
program.  Essentially,  we  have  used  the  error  operation  represented  by  abort  to  capture 
a  class  of  memory  errors  (those  that  result  from  dereferencing  nil).  The  copy.sum  and 
iter  sum  examples  are  both  based  on  taking  this  approach  to  proving  memory  safety. 


5.11.2  Complex  Examples 

We  have  also  run  some  experiments  involving  more  complicated  data  structures  and  algo¬ 
rithms.  These  were  chosen  as  motivating  examples  for  work  on  circuit  translation  [Cook 
et  al.,  2009a]  that  requires,  as  a  first  step,  the  computation  of  a  bound  on  the  amount  of 
memory  allocated  by  a  program.  To  compute  this  bound,  we  take  a  program  and  replace 
instances  of  alloc(y1, . . . ,  /„)  with  the  command  alloc(/1; . . . ,  fn) ;  mem  :=  mem  +  1. 
We  also  replace  free  x  with  free  x;  mem  :=  mem  —  1.  If  we  initialize  mem  to  0  at  the 
beginning  of  the  program,  then  mem  will  always  be  a  count  of  the  number  of  memory 
cells  currently  allocated  in  the  heap. 

We  can  then  ask  a  tool  for  computing  bounds  on  integer  variables  to  give  a  bound  on 
mem  in  terms  of  the  program  inputs.  For  example,  a  program  that  reads  in  n  integers 
may  store  these  values  in  a  list,  allocating  n  heap  cells  in  the  process.  If  it  performs  some 
sorting  of  this  list,  it  then  might  use  auxiliary  storage,  which  we  can  also  bound  in  terms 
of  n.  Generating  a  numeric  program  that  captures  the  connection  between  the  integer  n 
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that  is  input  and  subsequent  data  structure  allocations  and  transformations  is  the  key  to 
obtaining  such  bounds. 


Priority  Queue  This  example  repeatedly  reads  inputs,  inserting  them  into  a  sorted  list. 
It  then  outputs  the  list  in  sorted  order. 


Merge  Sort  This  example  implements  a  merge  of  two  sorted  sequences. 


Packet  Sorting  This  example  processes  pairs  of  identifiers  and  data.  The  program  reads 
in  a  list  of  identifier,  data  pairs  and  filters  them  as  they  are  read  to  ensure  that  if  a  duplicate 
identifier  is  encountered,  the  data  is  discarded.  Once  it  has  read  in  a  certain  number  of 
unique  elements,  it  sorts  them  according  to  identifier  and  then  outputs  the  sorted  list.  This 
example  mimics  the  behavior  of  a  simple  network  device,  which  would  use  a  similar  setup 
to  process  network  packets. 


Dictionary  This  example  uses  a  binary  search  tree  to  implement  a  dictionary. 


Huffman  Encoder  This  example  implements  the  Huffman  encoding  algorithm.  It  reads 
in  a  list  of  symbols  paired  with  their  frequency.  It  builds  a  list  of  one-element  trees  using 
this  data.  It  then  repeatedly  merges  the  two  trees  in  the  list  with  the  lowest  frequencies, 
assigning  the  sum  of  their  frequencies  to  the  resulting  tree.  The  building  phase  finishes 
when  the  list  contains  a  single  tree.  The  program  then  processes  queries,  repeatedly  read¬ 
ing  symbols  from  the  input  and  outputting  the  binary  string  corresponding  to  the  encoding 
of  that  symbol. 


Results  Table  5.3  lists  the  results  from  this  set  of  experiments.  In  each  case,  the  bound 
on  allocated  memory  in  terms  of  input  sizes  is  listed  along  with  the  number  of  lines  of 
code  in  the  example. 
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Program 

Bound 

LOC 

merge 

8  =i=  n\  +  8  *  tt-2 

80 

prio 

8*n 

56 

packet 

12  *  n  +  8 

95 

huffman 

52  *n-  12 

202 

bst.dict 

24  *  n 

142 

Table  5.3:  Heap  bounds  and  lines  of  code. 


Numeric  programs  were  produced  for  all  examples  and  bounds  were  inferred  by  the 
bounds  inference  algorithm  for  all  examples  except  huffman.  In  this  case,  the  numeric 
program  was  too  large  for  the  bounds  analysis  tool,  indicating  a  need  for  better  meth¬ 
ods  of  simplifying  the  numeric  abstraction  and  eliminating  unnecessary  instrumentation 
variables. 


5.11.3  Summary  and  Challenges 

Our  implementation  demonstrates  the  viability  of  this  approach  for  reasoning  about  safety 
and  liveness  of  heap-manipulating  programs.  However,  there  are  still  issues  to  be  solved 
before  such  an  approach  can  scale  to  large  programs.  The  biggest  issue  is  the  size  of  the 
numeric  programs  that  are  generated.  The  algorithm  presented  in  this  dissertation  and 
implemented  in  Thor  produces  a  number  of  temporary  variables  that  could  potentially  be 
eliminated,  either  with  a  post-processing  pass  or  during  the  instrumentation  process.  Extra 
variables  generally  degrade  performance  of  the  analysis  tools  that  we  run  on  the  numeric 
programs.  Finding  a  general  method  for  eliminating  these  temporary  variables  is  ongoing 
work. 

Another  contributor  to  the  size  of  the  numeric  program  is  the  disjunction  and  subse¬ 
quent  extra  branching  that  is  introduced  by  the  analysis.  This  is  hard  to  avoid,  as  much  of 
it  is  needed  for  the  memory  safety  proof.  Better  abstraction  procedures  and  better  abstract 
domains  that  benefit  shape  analysis  also  provide  an  immediate  benefit  to  an  algorithm 
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such  as  the  one  in  Thor,  which  is  heavily  based  on  these  techniques.  A  smaller  state 
space  during  the  memory  safety  proof  translates  directly  to  a  smaller  numeric  program. 
Much  progress  has  been  made  in  terms  of  abstract  domains  for  shape  analysis  that  permit 
more  concise  proofs  of  memory  safety  [Yang  et  al.,  2008],  so  we  are  optimistic  that  there 
is  room  for  improvement  in  numeric  program  size  based  on  these  techniques. 

It  may  also  be  worth  investigating  whether  performing  additional  abstraction  on  the 
numeric  program  would  help  with  these  issues.  For  example,  abstract  interpretation  meth¬ 
ods  could  possibly  be  used  to  simplify  the  update  relations  involved.  Such  investigations 
are  left  to  future  work. 


296 


Chapter  6 
Related  Work 


We  now  present  some  background  material  and  describe  existing  work  in  the  area  of 
static  analysis  for  heap-manipulating  programs,  termination  proving  of  such  programs, 
and  translations  from  heap  programs  to  numeric  programs. 


6.1  Approaches  to  Analyzing  the  Heap 

First,  we  will  discuss  various  approaches  to  reasoning  about  imperative  programs  that  ma¬ 
nipulate  the  heap  and  highlight  the  advantages  that  separation  logic  provides  over  previous 
methods. 

Alias  Analysis  The  simplest  static  analysis  for  programs  that  use  the  heap  is  an  alias 
analysis  [Shapiro  and  Horwitz,  1997b,  Landi  and  Ryder,  1992].  These  analyses  fall  into 
the  general  category  of  data-flow  analysis  and  originate  from  the  compiler  community.  At 
each  program  point,  a  set  of  equivalence  classes  is  computed.  Depending  on  the  analy¬ 
sis,  these  equivalence  classes  either  represent  variables  that  must  alias  or  those  that  may 
alias  [Deutsch,  1994].  This  information  is  useful  for  code  optimization,  but  also  when 
doing  program  verification.  For  example,  consider  the  sequence  of  commands  [x]  = 
3;  [y]  =  4,  where  we  use  brackets  to  indicate  dereferencing.  This  results  in  a  state 
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where  (x  =  y)  A  y  =  4  if  x  and  y  must  alias.  If  they  are  known  to  not  alias,  it  results 
in  x  =  3  A  y  =  4.  And  if  they  may  alias,  we  must  consider  that  both  possibilities  could 
hold.  That  is,  the  postcondition  would  be  (x  =  y  A  y  =  4)  V  (x  =  3  A  y  =  4).  In  general, 
if  n  variables  may  alias,  we  must  consider  2”  cases  (in  each  case  assuming  that  a  distinct 
subset  of  the  variables  alias).  This  quickly  becomes  intractable  even  for  small  n.  And  n 
is  generally  not  small,  particularly  when  dynamic  allocation  and  deallocation  are  involved 
[Shapiro  and  Horwitz,  1997a].  It  should  be  noted  that  the  imprecision  of  alias  analysis  is 
not  a  problem  for  compiler  transformations.  If  the  alias  analysis  results  are  too  imprecise 
to  be  useful,  the  compiler  simply  forgoes  any  alias-based  optimizations  it  would  otherwise 
apply.  Thus,  for  compiler  optimizations,  it  provides  a  good  tradeoff  between  usefulness  of 
results  and  analysis  time. 


Shape  Analysis  Shape  analysis  is  the  next  step  up  in  precision  for  the  analysis  of  pro¬ 
grams  that  manipulate  the  heap.  Rather  than  tracking  alias  sets  of  variables,  it  tracks 
invariants  of  pointer  structures.  For  example,  in  the  case  of  doubly-linked  lists,  a  shape 
analysis  would  check  the  fact  that  if  the  forward  link  of  memory  cell  a  points  to  cell  b,  then 
the  back  link  of  cell  b  points  to  cell  a.  Shape  properties  also  encompass  heap  reachability 
properties.  Continuing  with  the  example  of  linked  lists,  we  might  want  to  track  whether 
the  list  is  null-terminated.  That  is,  whether  a  cell  holding  the  value  null  is  reachable  from 
the  head  of  the  list  by  following  “next”  pointers. 


TVLA  One  of  the  most  thoroughly-studied  shape  analysis  frameworks  is  TVLA  (Three- 
Valued  Logic  Analysis)  [Sagiv  et  al.,  2002].  As  the  name  suggests,  it  is  based  on  using 
a  three-valued  logic  to  represent  abstract  states.  More  specifically,  the  logical  foundation 
consists  of  first-order  logic  with  transitive  closure.  The  set  of  individuals  corresponds  to 
the  set  of  heap  cells,  and  unary  predicates  are  used  to  record  which  cell  a  stack  variable 
points  to.  So,  for  example,  if  x  and  y  are  pointer-valued  variables  in  the  program,  we  would 
have  two  predicates  px  and  py.  If  a;  and  y  alias,  then  this  situation  would  be  represented  by 
the  formula  3c.  px(c)  A  py(c).  Fields  are  represented  by  binary  predicates,  f(a,  b),  where 
/  is  the  field  name,  a  is  a  memory  cell  with  field  /,  and  b  is  the  cell  pointed  to  by  the  / 
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field  of  a  (or  equivalently,  b  is  the  value  stored  in  the  /  field  of  a).  So  if  a;  is  a  pointer  to  a 
record  that  contains  a  next  field,  and  the  next  field  points  to  the  same  memory  location  as 
y,  this  would  be  written  3c,  d.  px(c)  A  next(c,  d )  A  py(d).  The  analysis  itself  uses  models 
rather  than  formulas  to  represent  the  program  state  at  each  point.  The  effect  is  the  same  in 
that  abstract  states  in  both  approaches  represent  sets  of  concrete  states. 


Shape  Analysis  Based  on  Separation  Logic  As  part  of  this  thesis,  I  present  a  shape 
analysis  based  on  separation  logic,  which  we  originally  described  in  [Magill  et  al.,  2006]. 
Similar  analyses  have  also  been  presented  in  [Distefano  et  al.,  2006]  and  [Chang  et  al., 
2007].  Significant  advances  to  the  style  of  analysis  we  utilize  are  present  in  [Berdine 
et  al.,  2007]  and  [Calcagno  et  al.,  2009].  Berdine  et  al.  [2007]  give  a  framework  with 
support  for  inferring  the  predicates  necessary  to  describe  higher-order  structures,  such  as 
lists-of-lists.  Calcagno  et  al.  [2009]  give  a  procedure  for  using  bi-abduction  to  infer  not 
only  invariants  and  post-conditions,  but  also  preconditions.  This  helps  to  eliminate  the 
need  for  any  programmer-supplied  annotations. 

Other  work  includes  [Chang  et  al.,  2007],  which  gives  a  shape  analysis  framework  that 
allows  data  structures  to  be  defined  by  routines  for  checking  their  structural  invariants. 
Chang  et  al.  have  extended  their  approach  to  support  numeric  invariants  of  data  struc¬ 
tures  Chang  and  Rival  [2008],  but  not  via  reduction  to  numeric  programs.  [Guo  et  al., 
2007]  give  a  method  of  automatically  inferring  the  appropriate  inductive  definitions  based 
on  the  code  being  analyzed.  Finally,  Distefano  and  Parkinson  [2008]  give  a  shape  analy¬ 
sis  with  support  for  user-provided  rewrite  rules,  although  the  rules  are  not  automatically 
generated  from  inductive  definitions,  as  they  are  in  our  implementation. 

There  has  also  been  previous  work  on  extending  shape  analysis  with  support  for  track¬ 
ing  integer  properties.  Calcagno  et  al.  handle  the  case  where  arithmetic  is  allowed  in  the 
domain  of  the  heap  Calcagno  et  al.  [2006].  For  approaches  based  on  TVLA,  there  is  the 
work  of  Beyer  et  al.  Beyer  et  al.  [2006].  Rugina  develops  an  analysis  targeting  balance 
properties  of  tree-shaped  data  structures  Rugina  [2004].  Nguyen  et  al.  present  a  veri¬ 
fication  condition-based  procedure  that  can  handle  shape  plus  size  properties  when  loop 
invariants  and  pre-  and  post-conditions  are  provided  Nguyen  et  al.  [2007].  However,  none 
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of  these  use  the  method  described  here  of  generating  numeric  programs  as  an  intermediate 
step  in  the  verification  process. 


Relation  with  TVLA  There  are  some  similarities  between  these  approaches  and  TVLA. 
For  example,  they  can  all  be  described  using  the  framework  of  abstraction  interpretation. 
Also,  their  approach  to  abstraction  is  similar  in  that  they  all  have  operations  that  can  be 
seen  as  folding  and  unfolding  of  an  inductive  definition  of  the  data  structure.  However, 
there  are  marked  differences  as  well.  In  TVLA,  one  describes  a  data  structure  by  stating  a 
number  of  properties  of  that  structure.  For  example,  a  list  is  defined  in  terms  of  the  basic 
predicates  for  stack  variables  and  field  dereference  plus  reachability  and  cyclicity.  Reason¬ 
ing  about  doubly-linked  lists  requires  the  addition  of  predicates  relating  dereferences  of 
“forward”  and  “back”  fields.  In  the  shape  analysis  based  on  separation  that  we  presented 
as  part  of  this  thesis,  the  data  structure  as  whole  is  defined  inductively.  We  believe  this 
allows  for  a  more  straightforward  definition  from  the  user’s  point  of  view. 

On  the  other  hand,  there  are  also  advantages  to  the  TVLA  approach.  Because  it  tracks 
individual  data  structure  properties,  rather  than  descriptions  of  specific  structures,  it  is 
more  general  than  the  approach  followed  in  our  work.  When  faced  with  a  data  structure 
that  was  not  considered  when  defining  the  instrumentation  predicates,  it  may  still  be  able 
to  provide  some  information. 

Another  notable  difference  between  the  two  approaches  is  in  their  treatment  of  disjoint 
data  structures.  In  TVLA,  two  structures  that  do  not  overlap  are  described  by  explicitly 
stating  that  elements  in  one  are  not  reachable  from  elements  in  the  other.  The  treatment 
based  on  separation  logic  has  support  in  the  logic  for  expressing  disjointness,  but  no  ex¬ 
plicit  support  for  expressing  reachability  (instead,  reachability  information  is  implicitly 
encoded  in  the  inductive  definitions  we  use  for  data  structures).  Taking  disjointness  as  a 
fundamental  property  allows  for  local  reasoning,  which  has  advantages  in  terms  of  scala¬ 
bility  of  the  analysis. 
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6.2  Termination  Proving 


Termination  proving  for  heap-manipulating  programs  has  been  described  in  Loginov  et  al. 
[2006a]  and  Podelski  et  al.  [2008].  Both  of  these  approaches  utilize  a  different  shape 
analysis  framework  and  Loginov  et  al.  [2006a]  does  not  involve  the  production  of  numeric 
abstractions,  instead  incorporating  a  rank-finding  algorithm  directly  in  the  analysis. 

The  work  in  Podelski  et  al.  [2008]  does  involve  the  production  of  numeric  abstrac¬ 
tions,  but  they  are  produced  from  counter-example  traces  generated  by  the  termination 
analysis  and  used  to  communicate  with  the  heap  analysis,  which  is  run  only  on-demand. 
By  contrast,  we  convert  an  entire  program  to  a  numeric  abstraction  before  doing  any  ter¬ 
mination  analysis,  which  permits  a  looser  coupling  between  the  termination  tool  and  the 
shape  analysis  tool. 

In  Brotherston  et  al.  [2008a],  Brotherston  et  al.  give  a  method  of  showing  termination 
of  programs  using  separation  logic,  based  on  the  notion  of  cyclic  proofs.  However,  they 
do  not  give  a  static  analysis  capable  of  automatically  generating  these  proofs.  It  is  also 
not  clear  that  such  an  approach  can  handle  cases  where  more  complicated  termination 
arguments,  such  as  lexicographic  orderings,  are  needed. 

In  Berdine  et  al.  [2006]  a  method  is  presented  for  using  a  separation  logic  shape  anal¬ 
ysis  to  prove  termination.  However,  that  work  is  tied  to  a  specific  rather  weak  abstract 
domain  for  tracking  size  changes.  The  approach  described  here  is  able  to  obtain  much 
more  precise  information  by  tracking  the  actual  change  in  data  structure  size  rather  than 
only  the  presence  and  direction  of  change. 

The  closest  work  to  ours  is  that  of  Boujjani  et  al.  Bouajjani  et  al.  [2006]  which  gives 
a  bi-simulation  between  programs  manipulating  singly-linked  lists  and  counter  automata 
and  Habermehl  et  al.  Habermehl  et  al.  [2007]  which  provides  a  termination  result  for  trees 
by  relating  tree-manipulating  programs  to  tree  automata.  By  focusing  on  specific  data 
structures,  these  papers  are  able  to  obtain  very  precise  results.  In  our  work,  we  obtain  a 
simulation  result  rather  than  bi-simulation,  but  the  result  holds  of  arbitrary  inductively- 
defined  data  structures. 
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6.3  Program  Logics 

In  this  section  we  discuss  related  work  in  logics  for  reasoning  about  programs  and,  in  par¬ 
ticular,  logics  with  a  notion  of  auxiliary  variables,  logics  designed  to  relate  two  programs, 
and  logics  designed  for  goto  languages. 

Auxiliary  Variables  Our  instrumentation  variables  are  similar  in  usage  to  auxiliary  vari¬ 
ables  in  Hoare  logic  [Owicki  and  Gries,  1976].  Both  auxiliary  variables  and  instrumen¬ 
tation  variables  are  not  permitted  to  affect  the  values  of  the  original  variables  nor  the 
control  flow  of  the  original  program.  However,  deciding  whether  one  program  has  been 
derived  from  another  by  the  addition  of  auxiliary  variables  is  a  purely  syntactic  operation. 
Our  rules  for  placing  commands  involving  instrumented  variables  are  based  in  part  on  the 
invariant  that  holds  at  the  point  where  the  command  is  being  added.  The  process  of  in¬ 
strumenting  a  program  can  also  change  the  structure  of  the  code  by  inserting  or  removing 
branches.  As  such,  there  is  not  a  simple  syntactic  relationship  between  the  two  programs. 
Our  treatment  of  existential  quantifiers  also  differentiates  our  work  as  mentioned  above 
and  in  Chapter  4.  By  virtue  of  the  fact  that  we  are  relating  two  programs  and  focusing 
on  simulation  as  the  defining  concept  for  soundness,  we  obtain  rules  that  relate  existential 
quantification  to  nondeterministic  assignment  and  disjunction  to  nondeterministic  choice 
in  a  novel  way. 

History  Variables  History  variables  Abadi  and  Lamport  [1988]  are  a  generalization  of 
auxiliary  variables.  An  augmented  transition  system  is  obtained  from  an  original  transition 
system  via  the  addition  of  history  variables  if  the  systems  satisfy  properties  H1-H5  in 
Abadi  and  Lamport  [1988],  the  first  four  of  which  informally  correspond  to  the  following. 

HI.  The  state  space  of  the  augmented  system  consists  of  the  state  space  of  the 
original  plus  the  addition  of  some  new  variables. 

H2.  Initial  states  in  the  original  system  and  augmented  system  agree  on  the  val¬ 
ues  of  the  original  variables. 
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H3.  If  the  augmented  system  takes  a  step,  and  we  project  out  the  new  variables, 
then  this  corresponds  to  a  step  in  the  original  system. 

H4.  The  augmented  system  can  simulate  any  step  of  the  original  system. 


The  condition  H5  specifies  how  fairness  constraints  for  the  properties  of  these  systems 
should  be  related,  and  we  omit  it  here  since  it  does  not  constrain  the  transition  systems. 

In  this  thesis,  we  have  proved  HI,  H2,  and  H4  for  our  instrumented  programs.  We  do 
not  give  a  formal  treatment  of  H3  for  instrumented  programs,  though  we  conjecture  that 
it  holds.  In  either  case,  clearly  our  instrumented  variables  have  much  in  common  with 
history  variables. 

If  H3  holds,  one  could  view  our  theory  of  instrumented  programs  as  giving  a  particular 
method  of  adding  history  variables  to  heap-manipulating  programs  using  separation  logic 
annotations  to  guide  the  process.  As  with  auxiliary  variables,  the  connection  between 
added  variables  and  existential  quantification  in  the  separation  logic  formulae  is  novel. 
The  conditions  above  on  history  variables  give  another  clue  as  to  why  such  a  connection  is 
reasonable.  Existential  quantification  is,  in  a  sense,  the  logical  analogue  of  the  projection 
operation  referenced  in  H3  and  H4. 


Relating  Programs  The  concept  of  relating  two  programs  at  different  levels  of  abstrac¬ 
tion  is  used  heavily  in  the  area  of  program  refinement  [Wirth,  1971].  However,  the  goal  of 
our  work,  and  thus  the  approach,  is  different.  In  program  refinement,  the  goal  is  typically 
to  start  from  a  high-level  description  of  the  program  and  produce  successively  lower-level 
refinements  until  a  concrete  implementation  is  reached.  By  contrast,  our  goal  is  to  take  a 
concrete  implementation  and  produce  a  more  abstract  version.  Furthermore,  the  relation 
between  the  two  programs  in  our  approach  is  looser  than  would  generally  be  acceptable  in 
a  program  refinement  context.  This  is  motivated  and  justified  by  our  goal  of  passing  the 
numeric  abstractions  to  automated  program  verification  tools. 

Another  approach  to  relating  programs,  based  on  a  relational  version  of  Hoare  logic, 
is  given  in  [Benton,  2004].  The  goal  is  to  relate  two  programs  when  their  total  correctness 
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properties  are  the  same.  In  our  work,  since  we  are  only  concerned  with  obtaining  an 
over-approximation  of  the  original  program,  the  numeric  program  may  diverge  in  cases 
where  the  original  program  terminates.  We  also  are  able  to  get  by  with  a  logic  where  the 
annotations  represent  sets  of  states  rather  than  relations.  Indeed,  the  main  goal  of  our  work 
is  to  offload  the  relational  reasoning  to  separate  analysis  tools. 

Yang  [2007]  gives  a  relational  logic  like  Benton’s  for  separation  logic  and  uses  it  to 
prove  that  the  Schorr- Waite  graph  marking  algorithm  is  equivalent  to  a  depth-first  traversal. 
This  approach  differs  from  ours  in  that  we  are  only  concerned  with  preserving  properties 
of  the  stack  variables,  whereas  the  logic  Yang  presents  tracks  relations  between  heaps  as 
well.  The  other  main  difference  is  that  we  are  focused  on  a  logic  that  can  be  automated 
and  a  means  of  automating  it,  whereas  the  logic  in  [Yang,  2007]  is  currently  only  suitable 
for  by-hand  proofs. 

Our  treatment  of  existential  quantifiers  is  also  a  key  difference  between  this  work  and 
other  work  in  logics  for  relating  programs.  Because  we  state  soundness  in  terms  of  simu¬ 
lation,  we  are  able  to  use  the  Exists  rule,  which  is  explained  in  Chapter  4,  Figure  4.1  to 
insert  and  update  variables  representing  values  that  are  quantified  in  the  original  program 
proof.  We  thus  obtain  information  about  how  quantified  values  change  without  resorting 
to  relational  invariants. 


Verification  of  Goto  Languages  Clint  and  Hoare  Clint  and  Hoare  [1972]  present  a  logic 
for  functions  that  can  be  interrupted  by  goto.  Here  the  idea  is  already  present  of  viewing 
“goto”  as  a  special  type  of  function  that  is  known  to  never  return.  This  is  essentially  the 
same  as  our  treatment,  where  gotos  are  viewed  as  executing  a  continuation.  The  proof 
system  that  Clint  and  Hoare  develop  handles  the  goto  construct  by  allowing  the  program 
prover  to  assume  that  the  triple  { Q }  goto  l  {false}  holds  of  any  goto  statement,  where 
Q  is  a  precondition  associated  with  label  /.  In  this  thesis,  we  note  the  redundancy  of  the 
post-condition  for  a  goto  statement  and  instead  work  solely  with  preconditions.  A  more 
significant  difference  exists  in  the  general  approach  of  Clint  and  Hoare  [1972]  versus  the 
approach  taken  here.  Clint  and  Hoare  view  gotos  as  exceptional  cases  in  an  otherwise  well- 
structured  program.  We  instead  view  gotos  as  the  main  control  flow  construct  and  provide 
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no  support  for  structured  control  constructs  such  as  while  loops.  This  has  the  advantage 
of  making  the  treatment  extremely  uniform.  Arbib  and  Alagic  [1979]  and  de  Bruin  [1981] 
also  present  similar  systems  for  proving  partial  correctness  of  goto  programs  and  note  the 
connection  to  continuations. 
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Chapter  7 
Conclusion 


In  this  thesis  work,  we  have  done  the  following 

1.  Developed  a  logic  of  instrumentation  for  relating  a  heap-manipulating  program  to  a 
numeric  abstraction,  which  tracks  how  numeric  properties  of  the  data  structures  are 
changing. 

2.  Developed  a  static  analysis  algorithm  that  generates  numeric  abstractions,  the 
soundness  of  which  is  justified  using  the  logic  of  instrumentation. 

3.  Implemented  the  static  analysis  and  used  this  implementation  to  prove  properties  of 
programs  of  various  sizes  and  operating  over  various  data  structures. 

We  now  discuss  each  of  these  items  in  turn,  summarizing  our  contributions  and  remaining 
future  work  in  each  area. 


7.1  Logic  of  Instrumentation 

The  logic  we  developed  in  Chapter  4  gives  a  program  proving  method  based  on  adding 
additional  variables  to  the  program.  The  basic  judgment  in  the  logic  relates  a  program 
to  an  instrumentation  of  that  program.  This  instrumentation  consists  of  the  commands 
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from  the  original  program  plus  some  additional  commands  and  branches  involving  new 
variables  not  present  in  the  original  program. 

This  proof  system  is  adapted  to  proving  properties  preserved  by  simulation  and  thus 
has  a  different  character  than  most  traditional  logics  based  on  pre-  and  post-condition  rea¬ 
soning.  In  particular,  the  simulation-based  view  of  verification  has  led  us  to  elevate  non¬ 
determinism  to  a  more  prominent  role.  We  obtain  proof  rules  that  use  nondeterministic 
choice  in  the  language  to  encode  disjunctions  from  the  logic  and  which  use  nondetermin¬ 
istic  assignment  to  capture  existential  quantification. 

The  logic  is  proved  sound  where  the  notion  of  soundness  is  that  if  two  programs  are 
related  by  the  logic,  then  a  simulation  relation  exists  between  them.  The  direction  of  sim¬ 
ulation  is  such  that  the  instrumented  program  is  an  abstraction  of  the  original  program  and 
the  notion  of  simulation  is  stuttering  simulation.  This  implies  that  all  LTL\X  properties 
that  hold  of  the  instrumentation  also  hold  of  the  original  program.  We  define  a  version 
of  LTL\X  where  the  state  properties  can  contain  separation  logic  formulae.  These  formu¬ 
lae  are  then  shown  to  be  invariant  under  stuttering  equivalence  and  thus  respect  stuttering 
simulation. 


Future  Work  We  only  considered  the  soundness  question  in  the  work  presented  here. 
A  remaining  open  question  is  what  can  be  attained  in  terms  of  completeness.  There  are 
many  possible  questions  to  investigate  here.  Bouajjani  et  al.  [2006]  obtain  a  bi-simulation 
result  for  list  programs  and  counter  automata,  implying  that  our  logic  of  instrumentation  or 
something  similar  could  potentially  be  shown  complete  for  this  class  of  programs.  It  would 
also  be  interesting  to  investigate  completeness  results  that  are  relative  to  completeness  of 
the  underlying  shape  analysis. 

The  instrumentation  variables  which  we  add  when  constructing  Instrumented  programs 
function  similarly  to  auxiliary  variables  Owicki  and  Gries  [1976],  but  are  less  restricted  in 
their  interactions  with  existing  program  variables  and  control  flow.  Such  variables  may  be 
useful  in  other  situations  where  auxiliary  variables  are  used,  such  as  in  proofs  of  parallel 
programs. 
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Finally,  considering  under-approximations  would  provide  a  means  of  proving  non¬ 
termination  and  other  properties  that  are  existentially  quantified  over  paths.  Combined 
these  could  potentially  allow  the  sound  handling  of  a  more  expressive  temporal  logic  such 
as  CTL*. 


7.2  Analysis  Algorithm 

We  also  presented  an  automated  analysis  based  on  the  logic  just  described.  This  cor¬ 
responds  to  a  restricted  subset  of  the  derivations  in  the  logic  of  instrumentation  and  its 
soundness  is  justified  by  showing  that  a  derivation  in  this  logic  exists  for  every  output 
returned  by  the  analysis. 

The  analysis  is  based  on  a  shape  analysis  that  uses  separation  logic  to  represent  abstract 
states.  In  the  process  of  describing  how  to  automatically  add  instrumentation  commands, 
we  also  show  how  we  can  automatically  obtain  shape  invariants  for  data  structures. 

Our  analysis  accepts  user-provided  descriptions  of  inductive  data  structures  and  uses 
these  during  the  shape  analysis  and  instrumentation  process.  By  altering  these  description 
files,  the  user  can  add  support  for  new  inductive  data  structures  or  change  the  notion  of 
size  that  is  tracked  by  the  instrumentation  variables. 


Future  Work  The  numeric  programs  that  are  produced  by  the  automated  analysis  can 
sometimes  be  quite  large.  However,  generally  a  much  shorter  proof  is  possible  according 
to  the  logic  presented  in  the  first  part  of  the  thesis.  Adding  optimizations  and  simplification 
passes  to  the  analysis  in  order  to  have  it  produce  a  numeric  program  closer  to  the  short 
program  that  a  human  can  often  discover  is  an  ongoing  challenge.  That  this  issue  arises 
is  not  surprising  since  the  same  issue  arises  with  shape  analysis  using  separation  logic. 
In  that  case,  the  invariants  discovered  automatically  are  often  more  complex  than  those 
discovered  by  hand  and  finding  better  abstract  domains  that  permit  the  discovery  of  these 
simpler  invariants  has  clear  benefits  in  terms  of  scalability  of  the  approach.  Much  progress 
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has  been  made  in  this  direction  for  the  pure  shape  analysis  problem  [Yang  et  al.,  2008],  so 
we  are  optimistic  that  similar  improvements  may  be  possible  for  instrumentation  analyses. 


7.3  Implementation 

We  implemented  the  analysis  algorithm  described  above  and  ran  experiments  involving  a 
number  of  programs  over  a  variety  of  data  structures,  including  composite  data  structures 
such  as  lists  of  trees.  The  implementation  analyses  C  code  and  generates  a  new  C  language 
program  that  is  a  numeric  abstraction  of  the  input.  Support  for  various  data  structures  is 
implemented  by  defining  a  language  of  inductive  specifications,  which  describe  inductive 
properties  of  the  data  structures.  For  example,  a  description  of  a  doubly-linked  list  would 
specify  that  it  can  be  unfolded  from  the  front  or  the  back  and  that  the  concatenation  of  two 
list  segments  is  also  a  list. 

The  implementation  is  written  in  Ocaml  and  uses  CIL  to  parse  the  C  code  provided 
as  input.  Yices  is  used  to  prove  pure  entailments  and  an  implementation  of  the  frame 
inference  procedure  described  in  Section  5.5.3  is  used  to  reason  about  spatial  formulae.  A 
number  of  optimizations  and  command  line  options  affecting  analysis  behavior  have  been 
incorporated  into  the  implementation  in  order  to  efficiently  handle  a  larger  set  of  programs. 

Future  Work  A  great  deal  of  implementation  efficiency  comes  down  to  heuristics.  For 
example,  quick  checks  that  indicate  an  implication  is  not  provable,  and  save  the  time 
required  to  do  a  full  proof  search,  can  significantly  program  decrease  analysis  time. 
Heuristics  for  generating  abstraction  patterns  from  inductive  specifications  and  choos¬ 
ing  good  points  at  which  to  apply  abstraction  are  also  important.  For  example,  sup¬ 
pose  we  have  an  inductive  definition  for  a  list  segment  and  are  analyzing  a  loop  that 
generates  a  null-terminated  list  at  x.  We  could  perform  abstraction  once  we  have  a 
single  points-to  x  (->•  [next  :  nil]  or  we  could  wait  for  a  pair  of  points-to  predicates 
3z.  x  i — y  [next  :  z]  *  z  H »  [next  :  nil].  Choosing  the  first  option  results  in  shorter  analysis 
times,  but  sometimes  prevents  programs  from  being  proved  memory  safe  that  could  be 
proved  by  taking  the  second  approach  of  waiting  longer  before  performing  abstraction. 
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Similarly,  when  analyzing  programs  that  call  non-recursive  functions,  these  functions 
can  be  inlined  and  the  program  treated  as  if  it  were  written  as  a  single  large  function.  Al¬ 
ternatively,  we  can  view  function  call  sites  as  an  opportunity  to  apply  abstraction,  which 
simplifies  the  symbolic  static  formulas  at  that  call  site,  but  may  result  in  too  much  infor¬ 
mation  loss  and  a  failure  to  prove  memory  safety. 

Currently,  we  choose  a  reasonable  default  for  these  options  and  provide  command¬ 
line  flags  that  allow  the  user  to  alter  the  behavior  of  the  analysis.  One  approach  that 
may  provide  a  better  solution  would  be  to  incorporate  counter-example  guided  abstraction 
refinement  [Clarke  et  al.,  2003].  This  technique,  which  originated  in  the  software  model 
checking  community,  is  based  on  the  idea  of  performing  abstraction  as  aggressively  as 
possible  but  providing  a  means  of  backtracking  and  keeping  more  precise  information  if 
this  abstraction  is  found  to  cause  problems. 

While  the  frequency  of  calls  to  abstraction  has  a  large  effect  on  the  running  time  of  the 
analysis,  the  actual  abstraction  function  used  is  at  least  as  important.  We  have  chosen  a 
relatively  simple  abstraction  function  for  our  implementation  and  exploring  other  options 
from  the  literature  may  provide  additional  improvements.  For  example,  in  [Yang  et  al., 
2008],  an  abstraction  function  is  described  that  provides  predicates  for  empty,  non-empty, 
and  possibly-empty  lists.  While  only  one  of  these  predicates  is  needed  to  reason  about 
list  programs,  including  all  of  them  allows  for  a  fairly  precise  abstraction  function  that 
still  results  in  the  small  state  space  sizes  that  are  usually  associated  with  coarser  abstrac¬ 
tion  functions.  In  [Chang  et  al.,  2007]  an  abstraction  function  is  described  that  uses  the 
symbolic  execution  history  to  guide  the  abstraction  process.  The  current  symbolic  state  is 
compared  to  the  symbolic  state  obtained  during  the  previous  iteration  of  a  loop  and  this 
combined  information  is  used  to  guide  abstraction. 

It  should  be  possible  to  incorporate  techniques  such  as  these  into  our  instrumentation 
analysis  in  order  to  further  improve  performance. 
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Appendix  A 
Guide  to  Notation 

A.l  Programs,  States,  and  Transition  Systems 

a  The  type  of  variables  and  expressions  denoting  addresses, 
i  The  type  of  variables  and  expressions  denoting  integers, 
r  An  arbitrary  type.  Either  a  or  i. 
xT  Variable  of  type  r.  Figure  2.1,  page  16. 
eT  Expression  of  type  r.  Figure  2.1,  page  16. 
c  Command.  Figure  2.1,  page  16. 
k  Continuation.  Figure  2.1,  page  16. 

P  Program.  Figure  2.1,  page  16. 

fv(t)  Free  variables  in  some  term  t  (  I  can  be  an  expression,  command, 
continuation,  program,  logical  formula,  etc.  Definitions  2,  3,  and 
2.2.1. 

Values  The  set  of  values.  Page  15. 

Stores  The  set  of  stores.  Page  15. 

Records  The  set  of  records.  Page  16. 
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Heaps 

v 


s 


h 

( s,h ) 

N 


G 

7 


S 

T 

traces(S) 

IP  I  Qo)) 


The  set  of  heaps.  Page  16. 

An  element  of  Values.  Page  15. 

An  element  of  Stores.  Page  15. 

An  element  of  Heaps.  Page  16. 

Memory  State.  A  store,  heap  pair. 

Denotation  of  expression  e.  A  function  from  Stores  to  Values. 
Figure  2.2,  page  18. 

Denotation  of  command  c.  A  function  from  Stores  x  Heaps  to 

2StoresxHeaps U{error}_  pjgure  2.3,  page  115. 

Set  of  execution  states.  Page  24. 

An  element  of  G.  Page  24. 

Transition  relation  for  continuations.  A  subset  of  G  x  G.  Figure 
2.4,  page  115. 

Transition  relation  for  programs.  A  subset  of  G  x  G.  Definition  13, 
page  115. 

Transition  System.  A  tuple  of  the  form  (A,  /,  F,  Definition 
11,  page  47. 

A  trace  of  a  transition  system.  Definition  12,  page  48. 

The  set  of  traces  of  transition  system  S.  Definition  48,  page  48 

The  transition  system  corresponding  to  program  P  with 
precondition  Q0.  Definition  14,  page  48. 


A.2  Relations 

R  An  arbitrary  relation. 

E  An  equivalence  relation. 

R+  The  transitive  closure  of  relation  R.  Definition  16,  page  49. 
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s  —v  s'  s  and  s'  agree  on  the  values  of  variables  in  V.  Definition  1,  page  17. 

7  =  7'  The  execution  states  7  and  7'  agree  on  all  but  the  current 
continuation.  Page  89. 

7  —v  7 7  The  execution  states  7  and  7'  include  the  same  heap  and  their  stores 
are  =y-related.  Definition  23,  page  91. 

7  =v  7 7  The  execution  states  7  and  7'  have  stores  that  are  =y-related.  Their 
heaps  are  not  required  to  be  the  same.  Definition  24,  page  93. 


A.3  Separation  Logic 


pT  An  inductive  predicate  name  with  arity  r.  Also  written  as  p  when 
the  arity  is  clear  from  context.  Figure  2.6,  page  27. 

p  A  record  expression.  Figure  2.6,  page  27. 

S  A  spatial  predicate.  Figure  2.6,  page  27. 

Q  A  separation  logic  formula.  Figure  2.6,  page  27. 

[p]  The  denotation  of  record  expression  p.  A  mapping  from  Stores  to 
Records. 


(.S',  h)  \=x  Q  The  memory  state  (s,  h)  satisfies  separation  logic  formula  0  given 
inductive  predicate  meanings  X.  Figure  2.7,  page  28. 

(.s,  h)  \=  0  The  memory  state  (s,  h)  satisfies  separation  logic  formula  ().  Used 
when  the  set  of  inductive  predicate  meanings  X  is  clear  from 
context  or  otherwise  unnecessary  (all  of  the  technical  development 
is  independent  of  the  particular  choice  of  A"). 


A.4  LTSL 

LTSL^  The  set  of  77-invariant  LTSL  formulae.  Definition  25,  page  94. 
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LTSLL  The  set  of  LTSL  formulae  containing  only  variables  in  the  set  V. 
All  these  formulae  are  ~=v -invariant.  Definition  26,  page  98. 


LTSLPI/  The  set  of  LTSL  formulae  containing  only  pure  state  formulas  over 
variables  in  the  set  V.  All  these  formulae  are  ~ »  -invariant. 

—V 

Definition  27,  page  99. 


mr  a) 

Sl  ~R,E  S2 

Tl  <E  T2 
Ti  « E  T2 


The  function  on  LTSL  formulae  defined  in  Figure  3.7  on  page  103 

S2  L-stuttcring  simulates  ,S'i  and  R  is  the  simulation  relation 
witnessing  this.  Definition  29,  page  119. 

T2  LLstuttering  contains  Ti.  Definition  21,  page  86. 

Ti  and  T2  are  LLstuttering  equivalent.  Definition  21,  page  86. 
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Pseudo-code 


We  use  an  MI  ,-like  pseudo-code  when  describing  our  algorithms.  The  type  system  in¬ 
cludes  the  standard  type  constructors  for  tuples  and  option  types.  We  also  assume  a  “set” 
type  exists  and  use  standard  set  notation  to  describe  values  of  set  type.  The  main  language 
constructs  are  match,  let,  and  return. 

return  simply  returns  the  value  following  it.  So  return  1  returns  the  integer  value  1. 
match  examines  a  value  and  executes  different  code  depending  on  the  form  of  the  value. 
For  example,  the  code  below  returns  1  if  c  is  an  assignment  statement  or  2  if  it  is  an 
allocation. 

match  c  with 

case  x  :=  e 

return  1 

case  x  :=  alloc(. . .) 

return  2 

end 

The  let  command  is  used  to  introduce  binding  an  perform  pattern  matching.  The  com¬ 
mand  let  e\  =  e2  in  pattern  matches  e2  against  e\,  introducing  bindings  if  the  match  suc- 
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ceeds.  If  the  match  fails,  the  match  failed  clause  is  executed.  The  code  below  returns 
Some  (a;)  if  the  continuation  k  starts  with  an  assignment  to  x  and  returns  None  otherwise. 


let  x  :=  e;  k'  =  k  in 

return  Some  (a;) 
match  failed  =>  return  None 


Finally,  we  note  that  let  statements  can  be  sequenced  and  let  bindings  of  the  form  x  —  t 
where  a:  is  a  variable  and  t  is  an  arbitrary  term  can  never  fail  (since  they  involve  no  pattern 
matching.  Also,  functions  can  be  recursive.  As  an  example,  the  code  in  Figure  9  converts 
all  assignment  statements  into  non-deterministic  assignments  in  the  continuation  k. 


Function  make  nondet(fc).  Pseudo-code  example.  Converts  assignment  statements 
into  non-deterministic  assignments  to  the  same  variable, 
match  k  with 

case  c;  k' 

let  (x  :=  e)  =  c  in 

let  k"  =  makejiondet^')  in 

return  (x  :—?;k") 

match  failed  => 

let  k"  =  make  nondet^')  in 
return  ( c;k ") 

case  branch  e!  =>  ki, . . .  ,en  =>  kn  end 

let  k[  =  make_nondet(/ci)  in 

let  k'n  =  make_nondet(/cn)  in 
return  branch  ei  =>  k[, . . .  ,en  ^  k!n  end 
case  goto  /  return  goto  /  case  halt  return  halt  case  abort  return  abort 

end 
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B.l  Local  Functions 

We  will  also  occasionally  define  functions  that  are  local  to  the  primary  function  being 
presented  in  a  figure.  The  syntax  for  this  is  as  below,  where  localfun  is  the  name  of  the 
local  function  begin  defined. 

fun  localfun (args)  = 
body  of  local  function 

in 

body  of  primary  function 
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